Student name: Peng Li
Student ID: 24008144
Hi Teo,
may because of my computer setting problem (Chinese system), I can not directly read the "csv" type file into python with my computer, so I changed this file into "xlsx" type and put it into the "notebook" for reading it into python. So maybe you need to use this way to check my assginment.Many thanks.
We have dataset covers socio-economic data on New Zealand, stretching back to early 1980s. The data covers a range of topics: income and wealth distribution, poverty and deprivation levels, health measures, education outcomes, safety and security, housing as well as employment. There are approximately ~100 columns in the dataset. I will have data wrangling, exploratory data analysis (EDA)/data visualisation as well as detailed data analysis to uncover insights from a real-world dataset. And will try to figure out valuable conclusions and provide recommendations to policy makers for the future based on the analysis results.
1.shared_prosperity_assignment_dataset_mangled
2.Data documentation
Dataset sources: please see the "Data documentation"
1.There are 8 variable groups. For each variable group there are also many variables, so can we analyze and sub-group these variables in each group to find co-relations and potenial reasons for trends of variables, and figure out valuable conclusions?
2.Through investigation of the corelation between different vaiable groups, can help us to find potential reasons and consequences of the changes in each group?
1.Primary data wranging to prepare for the data analysis;
2.EDA and data Visulisation to get first impression and ideas of the data and their trend/distribution; Further data wranging;
3.Get the assumption based on EDA and visulisation. And generally define the analysis direction and methodology;
4.Deeper analysis based on step 3 plan, summarize the findings to support and adjust the assumption;
5.Analysis conclusion summary, and recommnedations for the policy makers for the future.
import os
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="ticks")
from pylab import rcParams
import pylab
%matplotlib inline
matplotlib.style.use('ggplot')
# Set some Pandas options as you like
pd.set_option('display.notebook_repr_html', True)
#this line enables the plots to be embedded into the notebook
rcParams['figure.figsize'] = 15, 10
rcParams['font.size'] = 20
rcParams['axes.facecolor'] = 'white'
pd.set_option('display.max_columns', None)
-> Read the excel form (my computer can not read the "csv" type file and changed this file into "xlsx" type ) and replace '???' ,"-" and "null" to "None"
-> Sort by years
-> Delete all the '[a-zA-Z]' types in the form
-> Replace"\"
-> Replace '5.7\V' types
-> Change "" to NaN
-> Change "," in the number to "."
# replace '???' ,"-" and "null" to "None"
df = pd.read_excel('shared_prosperity_assignment_dataset_mangled.xlsx',na_values=['???',"-","None"])
# sort by years
df = df.sort_values(by='year')
# delete all the '[a-zA-Z]' types in the form
df = df.replace(to_replace=r'[a-zA-Z]', value='', regex=True)
# Replace"\"
df = df.replace(to_replace=("\\"), value=np.nan)
# Replace '5.7\V' types
df = df.applymap(lambda x: str(x).split('\\')[0])
# Change "" to NaN
df.replace(" ", "", inplace=True)
df.replace("", np.nan, inplace=True)
# Change "," in the number to "."
for col in df.columns:
df[col] = df[col].str.replace(',', '.')
df
| Q5:Q1 | D10:D1 | D10:D1-4(Palma) | P90:P10_bhc | P80:P20_bhc | P80:P50_bhc | P50:P20_bhc | GINI-BHC | top_10_perc_wealth_share | top_5_perc_wealth_share | top_1_perc_wealth_share | UN_middle_class_def_inc_share | OECD_middle_class_def_inc_share | LIH_B60_contemp_median_AHC | LIH_B50_contemp_median_AHC | LIH_B50_CV_AHC | LIH_B60_CV_AHC | LIH_B40_contemp_median_AHC | LIEH_B60_contemp_median_AHC | LIEH_B50_contemp_median_AHC | LIEH_B50_CV_median_AHC | LIEH_B60_CV_median_AHC | LIS65H_B60_CV_median_AHC | LISPH_B60_CV_median_AHC | LIS65H_B50_CV_median_AHC | LISPH_B50_CV_median_AHC | child_poverty_ahc_perc_HH_B50_CV | child_poverty_ahc_perc_HH_B60_CV | child_poverty_ahc_perc_HH_B40_contemp_median | child_poverty_ahc_perc_HH_B50_contemp_median | child_poverty_ahc_perc_HH_B60_contemp_median | child_poverty_ahc_perc_HH_B60_CV_one_or_more_FT_working_parents | child_poverty_ahc_perc_HH_B60_CV_one_or_more_PT_working_parents | rate_of_personal_insolvencies | loan_delinquency_percent | national_house_rental_A30%_income_spend | national_house_purchasing_A30%_income_spend | national_house_rental_HAM | national_house_purchasing_HAM | home_ownership_rate | homelessness_percentage_of_population | priority_A_state_housing_applicants_as_pop_percent | priority_B_state_housing_applicants_as_pop_percent | debt_servicing_as_percent_of_disp_inc | house_median_multiples | unemployment_rate | 60_64_years_unemp_rate | 65years_and_over_unemp_rate | underemployment_rate | employees_working_long_hours | labour_market_insecurity | long_term_unemployment_rate | pop_15_to_29_NEET | low_pay_percent_OECD_def | low_pay_percent_120percent_MW_def | living_min_wage_gap_magnitude | LIS_ms-16 | LP:RPW_ms-16 | teriary_education_participation_rate | education_spend_as_percent_of_GDP | education_spend_as_percent_of_total_gov_expenses | median_teriary_loan_as_perc_of_med_inc_AHC | median_teriary_loan_balance_as_perc_of_med_inc_AHC | university_affordability_ratio_to_mean_salary | polytechnics_affordability_ratio_to_mean_salary | wananga_affordability_ratio_to_mean_salary | bachelors_earning_premium_hourly | dips_cert_earning_premium_hourly | school_earning_premium_hourly | bachelors_earning_premium_weekly | dips_cert_earning_premium_weekly | school_earning_premium_weekly | health_expenditure_as_percent_of_gdp | health_expenditure_per_capita_PPP | depression_adult | excellent_very_good_or_good_self_rated_health | psychological_distress_adult | mood_anxiety_disorder_adult | healthy_weight_adult | unmet_after_hours_care_due_to_cost_adult | unmet_GP_care_due_to_cost_adult | veg_and_fruit_intake_adult | breakfast_eaten_at_home_less_5_days_a_week_child | emotional_behavioural_problems_child | adult_diabetes_rates | depression_child | excellent_very_good_or_good_parent_rated_health_child | unfilled_prescription_due_to_cost_child | unmet_need_for_after_hours_due_to_cost_child | unmet_need_for_GP_due_to_cost_child | vegetable_and_fruit_intake_child | healthy_weight_child | suicides_per_100000 | problem_gambling_intervention_prevelance_percent | total_prisoners_in_remand_rate | total_sentenced_prisoners_rate | total_post_sentence_offender_population_rate | violent_crime_victimisations_rate | recorded_murders_and_homicides_per_million | regional_gdp_proportional_variation | difference_in_percent_for_low_income_by_gender | gender_pay_gap | year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 24 | 4.13 | 6.15 | 0.91 | 3.25 | 2.32 | 1.51 | 0.65 | 27.2 | nan | nan | nan | 22.2 | 63.3 | 9.0 | 6.0 | 8.0 | nan | 4.0 | 3.0 | 1.0 | 3.0 | nan | nan | nan | nan | nan | 11.0 | nan | 6.0 | 9.0 | 14.0 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | NaN | nan | nan | NaN | nan | NaN | nan | nan | nan | nan | nan | nan | nan | NaN | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 1982-12-31 00:00:00 |
| 33 | -1.5 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | NaN | NaN | nan | NaN | nan | nan | nan | nan | NaN | nan | nan | nan | nan | nan | nan | nan | nan | NaN | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 1983-12-31 00:00:00 |
| 22 | nan | nan | nan | 3.31 | 2.29 | 1.53 | 0.67 | 27.5 | nan | nan | nan | 22.3 | 62.5 | 9.0 | 6.0 | 9.0 | nan | 4.0 | 2.0 | 1.0 | 2.0 | nan | nan | nan | 1.11111111111111 | 3.0 | 14.0 | nan | 6.0 | 10.0 | 15.0 | 56.0 | 3.0 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | NaN | nan | nan | nan | NaN | nan | nan | nan | NaN | nan | nan | nan | nan | nan | nan | NaN | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 1984-12-31 00:00:00 |
| 16 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | NaN | nan | NaN | nan | nan | nan | NaN | nan | nan | nan | nan | NaN | nan | NaN | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 9.9951191236967 | nan | nan | nan | nan | nan | nan | nan | nan | nan | 1985-12-31 00:00:00 |
| 9 | 4.04 | 6.06 | 0.91 | 3.2 | 2.19 | 1.48 | 0.68 | 27.0 | nan | nan | nan | 22.8 | 64.6 | 7.0 | 5.0 | 7.0 | nan | 3.0 | 4.0 | 2.0 | 4.0 | nan | nan | nan | 1.42857142857142 | 3.14285714285714 | 11.0 | nan | 5.0 | 7.0 | 11.0 | 62.0 | 2.0 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | NaN | 4.2 | 1.3 | 2 | nan | nan | nan | 7.93319426 | nan | nan | nan | nan | nan | nan | NaN | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 12.2976089403982 | nan | nan | nan | nan | nan | nan | nan | nan | nan | 1986-12-31 00:00:00 |
| 4 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | NaN | NaN | nan | 4.2 | 1.3 | 1.7 | nan | nan | nan | 10.56751469 | nan | nan | nan | nan | NaN | nan | NaN | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 13.5716733442562 | nan | nan | nan | nan | nan | nan | nan | nan | nan | 1987-12-31 00:00:00 |
| 10 | nan | nan | nan | 3.11 | 2.24 | 1.49 | 0.67 | 27.1 | nan | nan | nan | 22.4 | 63.7 | 10.0 | 6.0 | 9.0 | nan | 4.0 | 6.0 | 2.0 | 5.0 | nan | nan | nan | 1.33333333333333 | 1.66666666666666 | 12.0 | nan | 5.0 | 8.0 | 13.0 | 61.0 | 5.0 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 5.8 | 2.3 | 2.3 | NaN | nan | nan | 13.48464633 | nan | nan | nan | nan | NaN | NaN | NaN | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 13.9263361154962 | nan | nan | nan | nan | nan | nan | nan | 0.0 | nan | 1988-12-31 00:00:00 |
| 26 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 22.1 | 61.9 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 7.3 | 2.3 | 3.4 | nan | nan | nan | 17.44897948 | NaN | nan | NaN | nan | nan | NaN | NaN | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 13.3658525452438 | nan | nan | nan | nan | nan | nan | nan | nan | nan | 1989-12-31 00:00:00 |
| 1 | 4.46 | 6.35 | 1.1 | 3.43 | 2.42 | 1.6 | 0.66 | 30.2 | nan | nan | nan | 21.3 | 59.5 | 11.0 | 6.0 | 10.0 | nan | 4.0 | 6.0 | 2.0 | 6.0 | nan | nan | nan | 1.5 | 2.5 | 15.0 | nan | 5.0 | 7.0 | 16.0 | 57.0 | 6.0 | nan | nan | nan | nan | nan | nan | nan | NaN | nan | nan | nan | nan | 8 | 3.3 | 2.1 | nan | nan | nan | 22.06366623 | nan | nan | nan | nan | nan | NaN | NaN | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 12.9727691986114 | nan | nan | nan | nan | nan | nan | nan | 1.0 | nan | 1990-12-31 00:00:00 |
| 20 | nan | nan | nan | nan | nan | nan | nan | 101.5 | nan | nan | nan | 20.2 | 57.5999999999999 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 73.2978318752377 | nan | nan | nan | nan | nan | 10.6 | 2.8 | 2.5 | nan | nan | NaN | 24.32269694 | nan | nan | NaN | nan | nan | nan | NaN | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 13.7012554208811 | nan | nan | nan | nan | nan | nan | nan | nan | nan | 1991-12-31 00:00:00 |
| 21 | nan | nan | nan | 3.8 | 2.53 | 1.64 | 0.65 | 31.9 | nan | nan | nan | 20.0 | 56.0999999999999 | 17.0 | 11.0 | 20.0 | nan | 7.0 | 3.0 | 1.0 | 6.0 | nan | nan | nan | 1.5 | 3.45 | 32.0 | nan | 9.0 | 17.0 | 27.0 | 34.0 | 6.0 | nan | nan | nan | nan | nan | nan | 72.6664662558244 | nan | nan | nan | nan | nan | 10.7 | 2.7 | 1.7 | nan | nan | nan | 32.37237262 | NaN | nan | nan | NaN | nan | nan | NaN | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 14.1274508971636 | nan | nan | nan | nan | nan | nan | nan | 1.0 | nan | 1992-12-31 00:00:00 |
| 25 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 19.7 | 54.3 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 72.0415738678545 | nan | nan | nan | nan | NaN | 9.8 | 2.5 | 1.7 | nan | nan | nan | 33.7628865 | nan | NaN | NaN | nan | nan | nan | NaN | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 12.458383791355 | nan | nan | nan | nan | nan | nan | nan | nan | nan | 1993-12-31 00:00:00 |
| 0 | 5.09 | 8.03 | 1.21 | 3.87 | 2.52 | 1.66 | 0.66 | 32.2 | nan | nan | nan | 19.9 | 54.9 | 19.0 | 13.0 | 22.0 | nan | 7.0 | 3.0 | 1.0 | 7.0 | nan | nan | nan | 1.36363636363636 | 3.27272727272727 | 34.0 | nan | 10.0 | 20.0 | 29.0 | 36.0 | 10.0 | nan | nan | nan | nan | nan | nan | 71.4222938333089 | nan | nan | nan | nan | NaN | 8.4 | 3.3 | 2.3 | nan | nan | NaN | 32.93502612 | nan | nan | nan | nan | nan | NaN | 7.3 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 14.1483858234262 | nan | nan | nan | nan | 8.18110779765849 | 17.1195051910757 | nan | 2.0 | nan | 1994-12-31 00:00:00 |
| 18 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 19.8 | 56.0999999999999 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 70.7897583844212 | nan | nan | nan | NaN | nan | 6.5 | 2.7 | 1.5 | nan | nan | nan | 25.84053796 | nan | nan | nan | NaN | nan | nan | 8.1 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 14.9690007098353 | nan | nan | nan | nan | 8.30902116546058 | 10.881984874041 | nan | nan | nan | 1995-12-31 00:00:00 |
| 31 | nan | nan | nan | 3.79 | 2.55 | 1.66 | 0.65 | 33.1 | nan | nan | nan | 19.8 | 55.2 | 18.0 | 13.0 | 20.0 | nan | 8.0 | 6.0 | 3.0 | 7.0 | nan | nan | nan | 1.45 | 3.7 | 31.0 | nan | 12.0 | 20.0 | 28.0 | 39.0 | 9.0 | nan | nan | nan | nan | nan | nan | 70.2044583274173 | NaN | nan | NaN | nan | nan | 6.3 | 2.9 | 1.4 | nan | nan | nan | 21.16650992 | nan | nan | nan | NaN | 0.573814338648227 | 0.982628356529864 | 8.200000000000001 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 14.6565329456752 | nan | nan | nan | nan | 8.33383325745199 | 14.7371692845238 | nan | 3.0 | nan | 1996-12-31 00:00:00 |
| 19 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 18.9 | 54.4 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 1.80253994264645 | nan | nan | nan | nan | 69.6332518337408 | nan | nan | nan | nan | nan | 6.8 | 3.3 | 1 | NaN | NaN | nan | 19.72672923 | nan | 13.27024441 | nan | NaN | 0.57740251663902 | 0.974400010975234 | 8.1 | 4.8 | 15.4 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 14.8084646746629 | nan | nan | nan | nan | 8.12861523819595 | 17.4532068945456 | nan | nan | 0.0 | 1997-12-31 00:00:00 |
| 11 | 5.26 | 8.66 | 1.31 | 3.68 | 2.59 | 1.65 | 0.64 | 33.0 | nan | nan | nan | 19.6 | 54.9 | 18.0 | 13.0 | 18.0 | nan | 9.0 | 9.0 | 4.0 | 8.0 | nan | nan | nan | 1.22222222222222 | 3.44444444444444 | 27.0 | nan | 14.0 | 20.0 | 28.0 | 40.0 | 11.0 | nan | 3.99920807760839 | nan | nan | nan | nan | 69.065934065934 | NaN | nan | nan | 8.3 | nan | 7.7 | 3.4 | 1.2 | NaN | nan | nan | 19.51951938 | nan | 13.12055621 | NaN | nan | 0.576954730149871 | 0.975691793249282 | 8.3 | 4.9 | 15.7 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 15.0998123109611 | nan | nan | nan | nan | 7.85463074584621 | 13.8937313057223 | nan | 3.0 | 16.2 | 1998-12-31 00:00:00 |
| 5 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 53.6 | 22.9 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 2.71186440677966 | nan | nan | nan | nan | 68.4968557711812 | NaN | nan | NaN | 7.7 | nan | 7 | 5.4 | 1.8 | nan | NaN | nan | 21.14427859 | nan | 12.28404701 | nan | NaN | 0.586462449836746 | 0.961097218813725 | 9.4 | 5.0 | 15.7 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 13.4286102047181 | nan | 0.190133687749198 | 0.187631928699867 | nan | 7.34162041018424 | 13.0378776421259 | nan | nan | 15.2 | 1999-12-31 00:00:00 |
| 12 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 2.25869066525498 | nan | nan | nan | nan | 67.919382732473 | nan | NaN | NaN | 9.4 | nan | 6.1 | 4.8 | 1.7 | NaN | nan | nan | 19.86564299 | nan | 11.73855902 | nan | nan | 0.566411227251697 | 1 | 10.1 | 5.0 | 16.4 | 25.68303400840658 | 39.93121895299962 | 5.559656080630483 | 5.368577686550889 | 3.859013985972417 | nan | nan | nan | nan | nan | nan | 7.5 | 1606.3 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 11.9287437378153 | nan | 0.204937568001657 | 0.183487902181234 | nan | 6.9796642661002 | 14.5175130463186 | nan | nan | 14.0 | 2000-12-31 00:00:00 |
| 36 | 5.42 | 8.32 | 1.34 | 3.91 | 2.68 | 1.66 | 0.62 | 33.8 | nan | nan | nan | 19.2 | 53.2 | 20.0 | 13.0 | 18.0 | 25.0 | 8.0 | 8.0 | 3.0 | 7.0 | nan | nan | nan | 1.55555555555555 | 3.88888888888888 | 28.0 | 37.0 | 11.0 | 21.0 | 30.0 | 42.0 | 12.0 | nan | 3.10981535471331 | nan | nan | nan | nan | 67.6567223025364 | 0.737103455373453 | nan | nan | 8.5 | 3.2040320403204 | 5.5 | 3.4 | 1.3 | nan | nan | NaN | 17.34475386 | nan | 12.22912863 | nan | nan | 0.55267784188686 | 1.02704117500685 | 11.0 | 5.1 | 16.8 | 25.97396449704142 | 39.08165680473373 | 5.551530319919428 | 4.960433429642005 | 1.794410239487352 | nan | nan | nan | nan | nan | nan | 7.6 | 1693.8 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 12.8634136139006 | nan | 0.224149021020402 | 0.18578742892428 | nan | 6.84050737129184 | 13.656238968722 | 0.54002586391768 | 4.0 | 13.1 | 2001-12-31 00:00:00 |
| 30 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 2.74933370739234 | nan | nan | nan | nan | 67.4682395644283 | NaN | 0.0359384490787608 | 0.358194978740635 | 9.4 | 3.30073667145237 | 5.3 | 4.0 | 1.7 | nan | NaN | nan | 14.76293119 | nan | 13.62982548 | nan | nan | 0.547495509288571 | 1.03656972147256 | 11.9 | 5.0 | 17.3 | 24.71911832236339 | 37.98561151079137 | 5.235648764126072 | 4.400569236548513 | 0.8853161140067168 | nan | nan | nan | nan | nan | nan | 7.9 | 1834.4 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 11.6259117209399 | nan | 0.219123304312613 | 0.176452723223324 | nan | 6.91683539177971 | 16.7138114327535 | 0.384999957053682 | nan | 12.3 | 2002-12-31 00:00:00 |
| 14 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 7.42680633897394 | 0.30474565675 | 0.373271672749999 | 0.673379796 | 0.76148819875 | 67.2781705841136 | nan | 0.0226183678029644 | 0.337537552449288 | 9.1 | 3.62093417328687 | 4.8 | 3.7 | 1.2 | NaN | nan | nan | 13.60946763 | nan | 13.2223601 | NaN | nan | 0.554726914369521 | 1.02331273604486 | 12.5 | 5.2 | 17.6 | 24.09006465036784 | 39.2851304153972 | 4.984461828692988 | 3.370603066499685 | 0.4781744069771662 | nan | nan | nan | nan | nan | nan | 7.7 | 1852.8 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 12.3596635005989 | nan | 0.224867790550438 | 0.176378578344961 | 0.132507386349529 | 6.80162375549321 | 11.423434430728 | 0.644498818603519 | nan | 12.5 | 2003-12-31 00:00:00 |
| 3 | 5.51 | 9.15 | 1.31 | 4.17 | 2.74 | 1.62 | 0.59 | 33.4 | 55.0 | 41.0 | 20.0 | 19.0 | 54.6 | 20.0 | 14.0 | 17.0 | 22.0 | 9.0 | 9.0 | 5.0 | 6.0 | nan | nan | nan | 1.58823529411764 | 3.23529411764705 | 23.0 | 31.0 | 11.0 | 19.0 | 28.0 | 45.0 | 12.0 | nan | 3.2520325203252 | 0.28247260575 | 0.440848749 | 0.6587368645 | 0.783494614749999 | 67.0872634143294 | nan | 0.032186269474405 | 0.367940910313791 | 10.3 | 4.02912516350947 | 4 | 2.5 | nan | 3.3 | nan | nan | 11.65048555 | 10.8 | 13.16200699 | NaN | nan | 0.55843653911839 | 1.01630054518792 | 13.1 | 5.2 | 18.1 | 23.50086655112652 | 37.45667244367418 | 5.143210079430139 | 3.051085776072352 | 0.5410778670206408 | nan | nan | nan | nan | nan | nan | 7.9 | 1985.5 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 11.740197271122 | 0.1 | 0.243818328564091 | 0.202435982096999 | 0.171839459975053 | 6.18844620539535 | 11.4988929257342 | 0.401650925416796 | 0.0 | 12.7 | 2004-12-31 00:00:00 |
| 34 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 2.432331295996 | 0.26788212875 | 0.496710366 | 0.649182601 | 0.800634981749999 | 66.898247771288 | nan | 0.0268133462282398 | 0.345454545454545 | 11.6 | 4.48167866631891 | 3.8 | 2.1 | 1.5 | 2.8 | 15.6518062083016 | nan | 9.84308144 | 11.1 | 12.43929872 | nan | nan | 0.560911148063158 | 1.01453312786998 | 13.5 | 5.1 | 17.7 | 23.29185952694382 | 38.68104802751709 | 5.134079474249958 | 3.075128288143941 | 0.6006994713287425 | nan | nan | nan | nan | nan | nan | 8.3 | 2123.6 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 12.1861816669171 | 0.000784090909090909 | 0.220357833655706 | 0.21948742746615 | 0.184163442940038 | 6.26428916827853 | 14.7549350420031 | 0.388228129128096 | nan | 14.0 | 2005-12-31 00:00:00 |
| 27 | nan | nan | nan | nan | nan | nan | nan | nan | 57.0 | 43.0 | 21.0 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 11.9236 | 1.92239650902229 | 0.26140010425 | 0.54307055475 | 0.64662837475 | 0.816000718 | 66.6141159349987 | 0.795522423721119 | 0.0262585716675029 | 0.338374787948295 | 12.3 | 4.69563924509521 | 3.9 | 1.9 | 1.7 | 2.8 | 14.9316199062892 | nan | 7.649513333 | 10.7 | 14.61223605 | 17.9 | nan | 0.575006286281902 | 0.989086530774958 | 13.3 | 6.1 | 20.1 | 23.57814972104255 | 36.55591639603631 | 5.160616481113521 | 3.331011556808302 | 0.5746032019552264 | 64.2857142857143 | 33.92857142857143 | 7.142857142857142 | 152.7331189710611 | 92.92604501607717 | 9.003215434083602 | 8.6 | 2396.5 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 12.2488000823393 | 0.000884524406852555 | 0.266002437101283 | 0.220916063364633 | 0.17962870045158 | 6.35789549136262 | 11.7102175232651 | 0.469300100802692 | nan | 12.1 | 2006-12-31 00:00:00 |
| 23 | 5.31 | 8.1 | 1.2 | 4.13 | 2.59 | 1.61 | 0.62 | 32.0 | nan | nan | nan | 20.2 | 59.1 | nan | nan | nan | nan | nan | nan | nan | nan | nan | 2.0 | 3.16666666666666 | 2.23076923076923 | 3.53846153846153 | nan | nan | nan | nan | nan | 32.0 | 13.0 | 0.119993688 | 2.8698553948832 | 0.26473952575 | 0.62684495675 | 0.6438993085 | 0.84298999275 | 66.3118871083472 | nan | 0.0247740286782452 | 0.325517012919407 | 13.1 | 5.00007544827219 | 3.6 | 1.2 | nan | 3.1 | 14.74332887088 | 3.25044918060302 | 5.96797682899999 | 10.9 | 13.44092008 | nan | nan | 0.576097367526449 | 0.987016986408988 | 13.1 | 5.4 | 17.2 | 23.963735859844 | 36.08036917548087 | 5.204035084141148 | 3.487518095478457 | 0.600718653487315 | 63.42141863699582 | 37.69123783031988 | 7.440890125173842 | 159.6153846153846 | 105.1282051282051 | 7.692307692307693 | 8.3 | 2440.0 | 10.4 | 89.6 | 6.6 | 12.7 | 36.2 | nan | 111.3 | 43.0 | 7.4 | 1.8 | 5.1 | 0.2 | 97.6 | nan | nan | nan | nan | 67.7 | 11.0436982895819 | 0.00125289858501727 | 0.274998816904074 | 0.226018645591784 | 0.188396195163503 | 6.18311958733614 | 12.0750642819598 | 0.587194516336476 | 2.0 | 11.9 | 2007-12-31 00:00:00 |
| 17 | 5.34 | 8.53 | 1.28 | 4.01 | 2.57 | 1.61 | 0.62 | 33.3 | 55.0 | 41.0 | 19.0 | 19.8 | 59.0 | 19.0 | 13.0 | 13.0 | 18.0 | 9.0 | 14.0 | 7.0 | 7.0 | 14.0 | nan | nan | nan | nan | 17.0 | 24.0 | 10.0 | 18.0 | 25.0 | nan | nan | 0.176417361 | 8.92911315603195 | 0.26500752175 | 0.615998379 | 0.63594779925 | 0.83271986375 | 66.0128788326342 | nan | 0.0339981229469732 | 0.29763022055373 | 13.8 | 4.70807624374826 | 4 | 2.1 | 1.1 | 3.2 | 14.1537562670152 | 3.38942289352417 | 4.326328892 | 11.1 | 12.53418077 | NaN | nan | 0.566552622803064 | 1.00907681704586 | 12.4 | 5.1 | 16.8 | 24.174053182917 | 36.36180499597099 | 5.275847049000189 | 3.652069417554473 | 0.4670322292243831 | 67.34426229508198 | 32.06557377049181 | 4.786885245901642 | 151.1904761904762 | 92.26190476190477 | 3.571428571428571 | 9.1 | 2718.2 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 11.770168989926 | 0.00133974659784138 | 0.269943688409197 | 0.191459408728296 | 0.176044110746128 | 6.11353824495542 | 11.9720839172658 | 0.438598158892653 | 2.0 | 12.5 | 2008-12-31 00:00:00 |
| 15 | 5.34 | 8.53 | 1.28 | 3.98 | 2.55 | 1.58 | 0.63 | 32.9 | nan | nan | nan | 20.0 | 57.6 | 19.0 | 14.0 | 13.0 | 18.0 | 9.0 | 14.0 | 6.0 | 5.0 | 11.0 | 1.76470588235294 | 3.05882352941176 | 2.16666666666666 | 3.58333333333333 | 18.0 | 25.0 | 11.0 | 20.0 | 27.0 | 33.0 | 15.0 | 0.197668338 | 24.0755627009646 | 0.27398754475 | 0.547441367249999 | 0.645163383 | 0.806098038 | 65.7087560846871 | nan | 0.0409533322492973 | 0.292805872377987 | 10.8 | 4.65229068421294 | 5.8 | 2.9 | 1.4 | 4.3 | 13.4355539765084 | 5.2327537536621 | 6.413301781 | 14.1 | 12.87324359 | 24 | nan | 0.576229724523685 | 0.995758226068067 | 12.4 | 6.0 | 17.9 | 23.59515798429826 | 35.76594345825115 | 5.277309543396074 | 3.72571715416637 | 0.5642720403933268 | 62.5 | 25.875 | 6.25 | 135.8662613981763 | 103.951367781155 | 11.24620060790274 | 9.7 | 2954.5 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 11.329095033648 | 0.00226323491834885 | 0.288856884015888 | 0.202234662826081 | 0.1886454969918 | 6.31310831842783 | 15.8047279378781 | 0.541976273612743 | 1.0 | 11.5 | 2009-12-31 00:00:00 |
| 28 | 5.33 | 8.62 | 1.25 | 3.93 | 2.56 | 1.56 | 0.61 | 32.4 | 54.0 | 39.0 | 18.0 | 20.1 | 60.0999999999999 | 20.0 | 14.0 | 12.0 | 17.0 | 9.0 | 13.0 | 6.0 | 5.0 | 10.0 | 1.66666666666666 | 3.38888888888888 | 2.16666666666666 | 3.58333333333333 | 17.0 | 25.0 | 12.0 | 20.0 | 28.0 | 29.0 | 15.0 | 0.17039676 | 10.7336811841124 | 0.282854103 | 0.56577382025 | 0.6630272715 | 0.819475144499999 | 65.4045250974233 | nan | 0.0513439007580978 | 0.296577073282793 | 10.1 | 4.77659693016309 | 6.1 | 3.0 | 1.5 | 3.9 | 13.7687970059716 | 5.60113716125488 | 8.879184954 | 13.4 | 12.72185373 | NaN | nan | 0.556224675163015 | 1.03122755980926 | 12.2 | 6.0 | 18.3 | 24.4140625 | 35.73069852941176 | 5.462299069965385 | 3.920145171817537 | 0.6167547991979276 | 63.78881987577639 | 28.07453416149068 | 5.590062111801233 | 146.9512195121951 | 104.2682926829268 | 4.573170731707317 | 9.7 | 3006.4 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 11.8071897448967 | 0.00304249942568343 | 0.290902825637491 | 0.206753962784286 | 0.199586492074431 | 5.98630829313117 | 9.65370771331246 | 0.561874204800346 | 2.0 | 10.8 | 2010-12-31 00:00:00 |
| 2 | 5.94 | 9.75 | 1.44 | 4.26 | 2.67 | 1.64 | 0.62 | 35.1 | nan | nan | nan | 18.9 | 57.5 | 20.0 | 14.0 | 13.0 | 18.0 | 9.0 | 11.0 | 5.0 | 5.0 | 9.0 | 1.94736842105263 | 3.21052631578947 | 2.14285714285714 | 3.57142857142857 | 18.0 | 25.0 | 12.0 | 20.0 | 28.0 | 25.0 | 13.0 | 0.140193975 | 7.11753731343283 | 0.286207073 | 0.522625904 | 0.672104344 | 0.80877855875 | 65.1031136271734 | nan | 0.0561977064952237 | 0.252650297517269 | 9 | 4.70306362317849 | 6 | 2.9 | 1.7 | 4 | 13.2852367640578 | 5.59156560897827 | 8.914450139 | 13.2 | 13.69347462 | nan | nan | 0.556565360882348 | 1.03070274869252 | 11.0 | 5.7 | 16.6 | 26.76747526332588 | 40.40057452920524 | 5.625365837737466 | 4.116263815877027 | 0.6335082681272718 | 60.90584028605481 | 25.14898688915375 | 4.290822407628122 | 146.4071856287425 | 98.20359281437125 | 9.281437125748502 | 9.6 | 3106.5 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 10.9000006714222 | 0.00275630941796046 | 0.284385472949866 | 0.199165583749401 | 0.213596881198276 | 5.83858833185144 | 8.43968476636899 | 0.445194165465418 | 1.0 | 10.3 | 2011-12-31 00:00:00 |
| 8 | 5.28 | 8.23 | 1.23 | 4.01 | 2.61 | 1.65 | 0.63 | 32.4 | nan | nan | nan | 20.1 | 58.7 | 19.0 | 14.0 | 13.0 | 18.0 | 10.0 | 10.0 | 6.0 | 5.0 | 9.0 | 1.64705882352941 | 3.52941176470588 | 1.91666666666666 | 4.33333333333333 | 19.0 | 25.0 | 13.0 | 20.0 | 27.0 | 28.0 | 15.0 | 0.118259932 | 6.75171736997056 | 0.28689237275 | 0.48016812525 | 0.662568357 | 0.7887420035 | 64.8056861171615 | nan | 0.0589248872061124 | 0.170154397261205 | 8.7 | 4.8046064864858 | 6.4 | 3.9 | 1.6 | 4.2 | 13.2635662758451 | 5.8853006362915 | 13.29677878 | 13.5 | 13.55530034 | 23.9 | nan | 0.553735337412133 | 1.03746379202201 | 10.8 | 5.4 | 16.9 | 26.99841594869219 | 41.93099717961596 | 5.739509808579469 | 3.995548849960162 | 0.5089958053256108 | 59.09610983981693 | 20.13729977116704 | 1.716247139588105 | 143.3526011560694 | 92.48554913294798 | 0.2890173410404624 | 9.7 | 3156.8 | 14.3 | 89.3 | 4.5 | 16.3 | 34.3 | 6.7 | 13.6 | 44.5 | 8.7 | 3.3 | 5.5 | 0.3 | 97.8 | 6.6 | 4.5 | 4.7 | 55.5 | 64.5 | 12.3 | 0.00268596821366223 | 0.272065658512254 | 0.183304237422631 | 0.194254880177749 | 5.24748906069331 | 10.2089661947099 | 0.493306941151519 | 1.0 | 9.1 | 2012-12-31 00:00:00 |
| 7 | 5.38 | 8.29 | 1.3 | 4.02 | 2.62 | 1.64 | 0.63 | 33.4 | nan | nan | nan | 19.7 | 58.3 | 18.0 | 14.0 | 12.0 | 17.0 | 10.0 | 10.0 | 5.0 | 4.0 | 9.0 | 1.875 | 3.5 | 2.0 | 4.0 | 18.0 | 23.0 | 13.0 | 19.0 | 24.0 | 35.0 | 12.0 | 0.100523499 | 3.52633477633477 | 0.291834894 | 0.50975419275 | 0.66128961525 | 0.7931701685 | 64.4122657580919 | 0.914340971956731 | 0.116648300987249 | 0.180403445251534 | 8.5 | 5.03792323243437 | 5.8 | 3.6 | 1.6 | 4.1 | 14.0851526460705 | 5.33285951614379 | 12.17257318 | 11.9 | 14.08030591 | nan | 1.79347826086956 | 0.563012705025085 | 1.02123083663276 | 10.6 | 5.7 | 17.9 | 27.36466607825831 | 42.8471609296852 | 5.834845910325214 | 3.934648826896657 | 0.4902376499838548 | 59.83333333333333 | 29.16666666666667 | 3.499999999999995 | 162.3931623931624 | 94.01709401709401 | 9.971509971509972 | 9.4 | 3346.4 | 14.5 | 89.5 | 6.1 | 16.4 | 33.7 | 7.2 | 14.5 | 43.3 | 8.7 | 4.4 | 5.8 | 0.5 | 98.2 | 4.5 | 4.3 | 6.5 | 54.9 | 63.4 | 11.0 | 0.00279713045629343 | 0.264803112420446 | 0.181820226235185 | 0.194773652371421 | 5.23059797152944 | 10.8063568394107 | 0.512603102046059 | 2.0 | 11.2 | 2013-12-31 00:00:00 |
| 29 | 5.8 | 9.59 | 1.36 | nan | 2.78 | 1.66 | 0.6 | 34.2 | nan | nan | nan | 19.0 | 56.2 | 19.0 | 14.0 | 12.0 | 16.0 | nan | 12.0 | 6.0 | 4.0 | 8.0 | 2.05882352941176 | 3.47058823529411 | 1.69230769230769 | 3.92307692307692 | 17.0 | 23.0 | nan | 20.0 | 29.0 | nan | nan | 0.100758708 | 2.78443361150128 | 0.29141248675 | 0.534556459999999 | 0.6453577715 | 0.7959981185 | 64.0002245424946 | nan | 0.189426336664377 | 0.176375440384658 | 9.4 | 5.30111255886082 | 5.4 | 2.9 | 1.6 | 4.1 | 14.0320892397174 | 4.76441287994384 | 13.63636376 | 11.4 | 13.86467017 | nan | 2.04654255319149 | 0.54257237321901 | 1.05957281112658 | 10.2 | 5.2 | 17.3 | 27.84883300816533 | 44.64195389840307 | 5.959064936999185 | 3.702775588847961 | 0.4748641862326134 | 66.66666666666666 | 31.11111111111112 | 2.777777777777778 | 161.4325068870523 | 110.7438016528926 | 5.785123966942149 | 9.4 | 3453.3 | 15.5 | 91.4 | 6.2 | 18.6 | 33.4 | 7.0 | 14.0 | 41.5 | 8.4 | 4 | 5.4 | 0.7 | 98.4 | 3.9 | 3.9 | 5.3 | 53.4 | 62.6 | 11.73 | 0.00279785513283552 | 0.239037468702222 | 0.163723383040482 | 0.190777957501495 | 5.17941104783851 | 9.53559437799096 | 0.570174725178598 | 2.0 | 9.9 | 2014-12-31 00:00:00 |
| 13 | 5.88 | 9.82 | 1.45 | 4.01 | 2.61 | 1.61 | 0.62 | 35.0 | 59.0 | 45.0 | 22.0 | 19.1 | 57.6 | 20.0 | 15.0 | 12.0 | 16.0 | 10.0 | 13.0 | 7.0 | 4.0 | 8.0 | 2.3125 | 3.0 | 2.81818181818181 | 3.18181818181818 | 16.0 | 22.0 | 12.0 | 20.0 | 28.0 | 34.0 | 13.0 | 0.10844285 | 2.86082681924409 | 0.2846640175 | 0.508278942499999 | 0.62552256425 | 0.77981667375 | 63.5854497061106 | NaN | 0.18291913986911 | 0.165807840323527 | 8.8 | 5.50522662693445 | 5.4 | 3.1 | 1.5 | 3.9 | 13.6141930131136 | 4.68318223953247 | 13.20604627 | 11.3 | 13.89062945 | 24.9 | 2.45194805194805 | 0.556715416157437 | 1.03172792356201 | 9.8 | 5.3 | 17.8 | 27.99002738956388 | 45.18926890933352 | 6.057549384693136 | 3.692838567263505 | 0.4592918261953701 | 63.13513513513514 | 29.72972972972973 | 2.702702702702703 | 157.1045576407507 | 103.7533512064343 | 7.238605898123325 | 9.3 | 3530.1 | 14.6 | 88.9 | 6.2 | 17.4 | 33.1 | 5.8 | 13.7 | 40.5 | 8.6 | 4.0 | 6.1 | 1.1 | 98 | 5.2 | 3.3 | 6.1 | 54.8 | 63.1 | 12.27 | 0.00277020416150283 | 0.248255169264888 | 0.165807840323527 | 0.187463309634074 | 5.53688604787685 | 10.4434552171586 | 0.451131307605537 | 1.0 | 11.8 | 2015-12-31 00:00:00 |
| 32 | 5.61 | 9.1 | 1.36 | 4.14 | 2.59 | 1.62 | 0.63 | 33.8 | nan | nan | nan | 19.6 | 57.3 | 20.0 | 14.0 | 11.0 | 15.0 | 10.0 | 14.0 | 7.0 | 5.0 | 8.0 | 2.14285714285714 | 3.92857142857142 | 2.45454545454545 | 3.72727272727272 | 14.0 | 20.0 | 13.0 | 19.0 | 25.0 | nan | nan | 0.096088133 | 3.77506881635863 | 0.2839379445 | 0.534711984749999 | 0.607017201499999 | 0.78210112725 | 63.1780402449693 | nan | 0.169722133503673 | 0.12360268284893 | 8.2 | 5.79740687238183 | 5.1 | 3.0 | 1.2 | 4.3 | 15.0208548992001 | 4.4 | 14.14554371 | 12 | 11.18821004 | nan | 2.23358585858585 | 0.556034802420428 | 1.03252390867511 | 9.4 | 5.2 | 17.8 | 27.80509611086276 | nan | 6.2 | 3.6 | 0.4 | 55.3213367609255 | 28.43187660668381 | 2.827763496143962 | 160.9756097560976 | 111.6531165311653 | 16.53116531165312 | nan | nan | 15.4 | 87.8 | 6.8 | 18.8 | 32 | 6.9 | 14.3 | 40.1 | 10.3 | 4.3 | 5.8 | 0.3 | 97.7 | 3.8 | 4 | 4.5 | 48.5 | 63.9 | 12.33 | 0.0026445225167678 | 0.281635260300223 | 0.17943149153625 | 0.200894282976684 | 5.67267113808155 | 10.6544382127819 | 0.321063824979607 | 2.0 | 12.0 | 2016-12-31 00:00:00 |
| 6 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 1.04948674735713 | 0.286440483 | 0.574313678 | 0.607096032 | 0.798594578 | nan | nan | 0.252376980817347 | 0.125333611342785 | 8.3 | 6.10482289408685 | 4.7 | 3.1 | 1.2 | 4.4 | nan | nan | 15.64339767 | 11.8 | 12.09159648 | NaN | nan | NaN | nan | NaN | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 16.7 | 88.2 | 7.6 | 19.9 | 31.9 | 6.6 | 14.3 | 38.8 | 9.3 | 4.9 | 5.6 | 0.5 | 98.1 | 3.9 | 2.6 | 3 | 49.8 | 62.5 | 12.64 | 0.00242055879899916 | 0.307631359466221 | 0.191909924937447 | 0.212364470391993 | nan | 7.3008688033876 | nan | nan | nan | 2017-12-31 00:00:00 |
| 35 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | NaN | nan | nan | 6.23974766855127 | nan | nan | nan | nan | nan | nan | nan | NaN | nan | nan | nan | nan | NaN | NaN | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 2018-12-31 00:00:00 |
print(df.dtypes)
Q5:Q1 object
D10:D1 object
D10:D1-4(Palma) object
P90:P10_bhc object
P80:P20_bhc object
...
recorded_murders_and_homicides_per_million object
regional_gdp_proportional_variation object
difference_in_percent_for_low_income_by_gender object
gender_pay_gap object
year object
Length: 103, dtype: object
numericcolumns = df.columns[:-1]
df[numericcolumns] = df[numericcolumns].apply(pd.to_numeric,errors='coerce')
print(df.dtypes)
Q5:Q1 float64
D10:D1 float64
D10:D1-4(Palma) float64
P90:P10_bhc float64
P80:P20_bhc float64
...
recorded_murders_and_homicides_per_million float64
regional_gdp_proportional_variation float64
difference_in_percent_for_low_income_by_gender float64
gender_pay_gap float64
year object
Length: 103, dtype: object
column_names = df.columns.tolist()
print("All column names:", column_names)
All column names: ['Q5:Q1', 'D10:D1', 'D10:D1-4(Palma)', 'P90:P10_bhc', 'P80:P20_bhc', 'P80:P50_bhc', 'P50:P20_bhc', 'GINI-BHC', 'top_10_perc_wealth_share', 'top_5_perc_wealth_share', 'top_1_perc_wealth_share', 'UN_middle_class_def_inc_share', 'OECD_middle_class_def_inc_share', 'LIH_B60_contemp_median_AHC', 'LIH_B50_contemp_median_AHC', 'LIH_B50_CV_AHC', 'LIH_B60_CV_AHC', 'LIH_B40_contemp_median_AHC', 'LIEH_B60_contemp_median_AHC', 'LIEH_B50_contemp_median_AHC', 'LIEH_B50_CV_median_AHC', 'LIEH_B60_CV_median_AHC', 'LIS65H_B60_CV_median_AHC', 'LISPH_B60_CV_median_AHC', 'LIS65H_B50_CV_median_AHC', 'LISPH_B50_CV_median_AHC', 'child_poverty_ahc_perc_HH_B50_CV', 'child_poverty_ahc_perc_HH_B60_CV', 'child_poverty_ahc_perc_HH_B40_contemp_median', 'child_poverty_ahc_perc_HH_B50_contemp_median', 'child_poverty_ahc_perc_HH_B60_contemp_median', 'child_poverty_ahc_perc_HH_B60_CV_one_or_more_FT_working_parents', 'child_poverty_ahc_perc_HH_B60_CV_one_or_more_PT_working_parents', 'rate_of_personal_insolvencies', 'loan_delinquency_percent', 'national_house_rental_A30%_income_spend', 'national_house_purchasing_A30%_income_spend', 'national_house_rental_HAM', 'national_house_purchasing_HAM', 'home_ownership_rate', 'homelessness_percentage_of_population', 'priority_A_state_housing_applicants_as_pop_percent', 'priority_B_state_housing_applicants_as_pop_percent', 'debt_servicing_as_percent_of_disp_inc', 'house_median_multiples', 'unemployment_rate', '60_64_years_unemp_rate', '65years_and_over_unemp_rate', 'underemployment_rate', 'employees_working_long_hours', 'labour_market_insecurity', 'long_term_unemployment_rate', 'pop_15_to_29_NEET', 'low_pay_percent_OECD_def', 'low_pay_percent_120percent_MW_def', 'living_min_wage_gap_magnitude', 'LIS_ms-16', 'LP:RPW_ms-16', 'teriary_education_participation_rate', 'education_spend_as_percent_of_GDP', 'education_spend_as_percent_of_total_gov_expenses', 'median_teriary_loan_as_perc_of_med_inc_AHC', 'median_teriary_loan_balance_as_perc_of_med_inc_AHC', 'university_affordability_ratio_to_mean_salary', 'polytechnics_affordability_ratio_to_mean_salary', 'wananga_affordability_ratio_to_mean_salary', 'bachelors_earning_premium_hourly', 'dips_cert_earning_premium_hourly', 'school_earning_premium_hourly', 'bachelors_earning_premium_weekly', 'dips_cert_earning_premium_weekly', 'school_earning_premium_weekly', 'health_expenditure_as_percent_of_gdp', 'health_expenditure_per_capita_PPP', 'depression_adult', 'excellent_very_good_or_good_self_rated_health', 'psychological_distress_adult', 'mood_anxiety_disorder_adult', 'healthy_weight_adult', 'unmet_after_hours_care_due_to_cost_adult', 'unmet_GP_care_due_to_cost_adult', 'veg_and_fruit_intake_adult', 'breakfast_eaten_at_home_less_5_days_a_week_child', 'emotional_behavioural_problems_child', 'adult_diabetes_rates', 'depression_child', 'excellent_very_good_or_good_parent_rated_health_child', 'unfilled_prescription_due_to_cost_child', 'unmet_need_for_after_hours_due_to_cost_child', 'unmet_need_for_GP_due_to_cost_child', 'vegetable_and_fruit_intake_child', 'healthy_weight_child', 'suicides_per_100000', 'problem_gambling_intervention_prevelance_percent', 'total_prisoners_in_remand_rate', 'total_sentenced_prisoners_rate', 'total_post_sentence_offender_population_rate', 'violent_crime_victimisations_rate', 'recorded_murders_and_homicides_per_million', 'regional_gdp_proportional_variation', 'difference_in_percent_for_low_income_by_gender', 'gender_pay_gap', 'year']
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['Q5:Q1', 'D10:D1', 'D10:D1-4(Palma)', 'P90:P10_bhc', 'P80:P20_bhc', 'P80:P50_bhc', 'P50:P20_bhc', 'GINI-BHC', 'top_10_perc_wealth_share', 'top_5_perc_wealth_share', 'top_1_perc_wealth_share', 'UN_middle_class_def_inc_share', 'OECD_middle_class_def_inc_share']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('Income and wealth',fontsize=22)
plt.legend()
plt.grid(True)
plt.show()
# from the chart we see four obvious wrong numbers, so will change them to NaN
df.loc[33, 'Q5:Q1'] = np.nan
df.loc[20, 'GINI-BHC'] = np.nan
df.loc[5, 'UN_middle_class_def_inc_share']= np.nan
df.loc[5, 'OECD_middle_class_def_inc_share'] = np.nan
#[5, 'UN_middle_class_def_inc_share'] and [5, 'OECD_middle_class_def_inc_share'] wrong numbers could be opposite as they are in same row and meets each ohter level.
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['Q5:Q1', 'D10:D1', 'D10:D1-4(Palma)', 'P90:P10_bhc', 'P80:P20_bhc', 'P80:P50_bhc', 'P50:P20_bhc', 'GINI-BHC', 'top_10_perc_wealth_share', 'top_5_perc_wealth_share', 'top_1_perc_wealth_share', 'UN_middle_class_def_inc_share', 'OECD_middle_class_def_inc_share']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('Income and wealth',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['Q5:Q1', 'D10:D1', 'D10:D1-4(Palma)']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('from decile side',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
-> The three aviables all show the difference of richest group to poorest group. The whole trend shows that the gap is becoming bigger.
-> Decreased gap at the end of 2007 and at the end of 2012, this maybe related to financial crises, which had higher influences on richest group.
-> Decreased gap at the end of 2016 is strange without finding the reason.
df['year'] = pd.to_datetime(df['year'])
selected_columns = [ 'top_10_perc_wealth_share', 'top_5_perc_wealth_share', 'top_1_perc_wealth_share']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('data inside top 10%',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
-> not so many data, but similar trends of the three variables. not so big changes from 2005 to 2016;
-> the three variables is related to 'Q5:Q1', 'D10:D1', 'D10:D1-4(Palma)', further study the wealth distribution in the top 10%;
-> although not so many data, it is still suprised to see that the top 10% occupy more than half the wealth! If we compared to the income data"'D10:D1'" which is around 9, we can see that the richest 10% accumate much more wealth if we compare to the income difference. As the poor group need to use most of their income for daily life, without not so much saving or wealth.
df['year'] = pd.to_datetime(df['year'])
selected_columns = [ 'P90:P10_bhc', 'P80:P20_bhc', 'P80:P50_bhc', 'P50:P20_bhc']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title(' from the percentage side ',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
-> Similar trends as above
-> Seems that 'P50:P20_bhc' should be bigger than 1, current result could be P20:P50, so let's try to reverse it. And we can double check that: 'P50:P20_bhc' = 'P80:P20_bhc'/'P80:P50_bhc'. General calcualtion shows that our estimation is right. Both ways have the same result.
df['P50:P20_bhc'] = df['P50:P20_bhc'].apply(lambda x: 1/x if pd.notnull(x) and x != 0 else x)
df['year'] = pd.to_datetime(df['year'])
selected_columns = [ 'P90:P10_bhc', 'P80:P20_bhc', 'P80:P50_bhc', 'P50:P20_bhc']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title(' from the percentage side ',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
df_percentage = df[['P90:P10_bhc', 'P80:P20_bhc', 'P80:P50_bhc', 'P50:P20_bhc']].transpose()
fig, ax = plt.subplots()
ax.plot(df_percentage.index, df_percentage.values)
[<matplotlib.lines.Line2D at 0x223ec7e8110>, <matplotlib.lines.Line2D at 0x223ec82f350>, <matplotlib.lines.Line2D at 0x223ec7ea090>, <matplotlib.lines.Line2D at 0x223ec7e91d0>, <matplotlib.lines.Line2D at 0x223ec7ea450>, <matplotlib.lines.Line2D at 0x223ec7e8b10>, <matplotlib.lines.Line2D at 0x223ec7e8310>, <matplotlib.lines.Line2D at 0x223ec7eb590>, <matplotlib.lines.Line2D at 0x223ec7eb650>, <matplotlib.lines.Line2D at 0x223ec7eacd0>, <matplotlib.lines.Line2D at 0x223ec7ebd50>, <matplotlib.lines.Line2D at 0x223ec7ea550>, <matplotlib.lines.Line2D at 0x223ec7eae50>, <matplotlib.lines.Line2D at 0x223ec7ea9d0>, <matplotlib.lines.Line2D at 0x223ec7e8fd0>, <matplotlib.lines.Line2D at 0x223ec7f3e50>, <matplotlib.lines.Line2D at 0x223ec7d6210>, <matplotlib.lines.Line2D at 0x223ec7f0690>, <matplotlib.lines.Line2D at 0x223ec7f1350>, <matplotlib.lines.Line2D at 0x223ec7f02d0>, <matplotlib.lines.Line2D at 0x223ec7f04d0>, <matplotlib.lines.Line2D at 0x223ec7f1c90>, <matplotlib.lines.Line2D at 0x223ec7eaed0>, <matplotlib.lines.Line2D at 0x223ec833750>, <matplotlib.lines.Line2D at 0x223ec831a10>, <matplotlib.lines.Line2D at 0x223ec7f15d0>, <matplotlib.lines.Line2D at 0x223ec7f0110>, <matplotlib.lines.Line2D at 0x223ec7f3ad0>, <matplotlib.lines.Line2D at 0x223ec7f2e50>, <matplotlib.lines.Line2D at 0x223ec16dd90>, <matplotlib.lines.Line2D at 0x223ec7ffed0>, <matplotlib.lines.Line2D at 0x223ec82d0d0>, <matplotlib.lines.Line2D at 0x223ec7fd250>, <matplotlib.lines.Line2D at 0x223ec7fe390>, <matplotlib.lines.Line2D at 0x223ec7fe1d0>, <matplotlib.lines.Line2D at 0x223ec7ff850>, <matplotlib.lines.Line2D at 0x223ec7ff790>]
df[['P90:P10_bhc', 'P80:P20_bhc', 'P80:P50_bhc', 'P50:P20_bhc']].plot(kind='box')
<Axes: >
-> P90:P10 box is bigger, show the bigger changes. Combined with curve, we see that distance between the richest and poorest is even bigger year after year. This also influences the GINI level.
df['year'] = pd.to_datetime(df['year'])
selected_columns = [ 'GINI-BHC']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('GINI',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
-> Interpretation of the Gini coefficient can vary depending on the context and the country. However, in general: A Gini coefficient below 0.3 is considered low inequality. A Gini coefficient between 0.3 and 0.4 is considered moderate inequality. A Gini coefficient between 0.4 and 0.6 is considered high inequality. A Gini coefficient above 0.6 is considered severe inequality.
-> big increase from 1990s and relative big variation from 2008 to 2016; although it is still under 0.4, but the inequality is increasing.
df['year'] = pd.to_datetime(df['year'])
selected_columns = [ 'UN_middle_class_def_inc_share', 'OECD_middle_class_def_inc_share']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('UN middle class and OECD middle class',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
Percent share of all income earned by lower earners as defined by the UN (Deciles 1-4). Actually this indicates the lowest income group (Deciles 1-4). It decreased in 1990s and then is relatively stable.
Percent Share of all income earned by the Middle Class as defined by OECD (75%-200% of median). This can indicate the middle class. It has similar trend as the "UN_middle_class_def_inc_share" and also decreased in 1990s, but then increases again, which is different from "UN_middle_class_def_inc_share". So the middle class's situation improves compared to the group (Deciles 1-4)
-> we can see some corelation of the two variables with 'GINI-BHC'. More deeper analysis can be done in next section.
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['LIH_B60_contemp_median_AHC', 'LIH_B50_contemp_median_AHC', 'LIH_B50_CV_AHC', 'LIH_B60_CV_AHC', 'LIH_B40_contemp_median_AHC', 'LIEH_B60_contemp_median_AHC', 'LIEH_B50_contemp_median_AHC', 'LIEH_B50_CV_median_AHC', 'LIEH_B60_CV_median_AHC', 'LIS65H_B60_CV_median_AHC', 'LISPH_B60_CV_median_AHC', 'LIS65H_B50_CV_median_AHC', 'LISPH_B50_CV_median_AHC', 'child_poverty_ahc_perc_HH_B50_CV', 'child_poverty_ahc_perc_HH_B60_CV', 'child_poverty_ahc_perc_HH_B40_contemp_median', 'child_poverty_ahc_perc_HH_B50_contemp_median', 'child_poverty_ahc_perc_HH_B60_contemp_median', 'child_poverty_ahc_perc_HH_B60_CV_one_or_more_FT_working_parents', 'child_poverty_ahc_perc_HH_B60_CV_one_or_more_PT_working_parents', 'rate_of_personal_insolvencies', 'loan_delinquency_percent']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('Poverty',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
-> Only see one sudden increase of "OECD_middle_class_def_inc_share" in beginning of 2009, maybe related to financial crisis in 2008;
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['LIH_B40_contemp_median_AHC', 'LIH_B50_contemp_median_AHC', 'LIH_B60_contemp_median_AHC', 'LIH_B50_CV_AHC', 'LIH_B60_CV_AHC']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('percentage of households earning',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
-> 'LIH_B50_CV_AHC', 'LIH_B60_CV_AHC' decease after 2007, while 'LIH_B40_contemp_median_AHC', 'LIH_B50_contemp_median_AHC', 'LIH_B60_contemp_median_AHC' vibrate up and down. The reason is that the income after housing costs increases;
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['LIEH_B50_contemp_median_AHC', 'LIEH_B60_contemp_median_AHC', 'LIEH_B50_CV_median_AHC', 'LIEH_B60_CV_median_AHC', 'LIH_B50_contemp_median_AHC', 'LIH_B60_contemp_median_AHC', 'LIH_B50_CV_AHC', 'LIH_B60_CV_AHC']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('the ederly situation',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
-> the elderly situation is simliar, but seems that their situation is better than the overall. Bigger decrease compared to 2007, that means that the elderly's income after housing cost has bigger increase compared to overall after 2007. But obviously not so good after 2014;
df['year'] = pd.to_datetime(df['year'])
selected_columns = [ 'child_poverty_ahc_perc_HH_B40_contemp_median', 'child_poverty_ahc_perc_HH_B50_contemp_median', 'child_poverty_ahc_perc_HH_B60_contemp_median','child_poverty_ahc_perc_HH_B50_CV', 'child_poverty_ahc_perc_HH_B60_CV', 'child_poverty_ahc_perc_HH_B60_CV_one_or_more_FT_working_parents', 'child_poverty_ahc_perc_HH_B60_CV_one_or_more_PT_working_parents']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('earning of families with children',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['rate_of_personal_insolvencies', 'loan_delinquency_percent']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('personal insolvencies and loan delinquency',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
-> For 'rate_of_personal_insolvencies', the number at the end of 2006 is not right. 11.9236 could be 0.119236. But just take it as none.
-> Sudden big increase of 'loan_delinquency_percent' at end of 2009, could also be the subsequent results of financial crisis in 2008;
df.loc[27, 'rate_of_personal_insolvencies'] = np.nan
df['national_house_rental_A30%_income_spend'] = df['national_house_rental_A30%_income_spend'].apply(lambda x: x * 100 if x is not None else x)
df['national_house_purchasing_A30%_income_spend'] = df[ 'national_house_purchasing_A30%_income_spend'].apply(lambda x: x * 100 if x is not None else x)
df['priority_A_state_housing_applicants_as_pop_percent'] = df['priority_A_state_housing_applicants_as_pop_percent'].apply(lambda x: x * 100 if x is not None else x)
df['priority_B_state_housing_applicants_as_pop_percent'] = df[ 'priority_B_state_housing_applicants_as_pop_percent'].apply(lambda x: x * 100 if x is not None else x)
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['national_house_rental_A30%_income_spend', 'national_house_purchasing_A30%_income_spend', 'national_house_rental_HAM', 'national_house_purchasing_HAM', 'home_ownership_rate', 'homelessness_percentage_of_population', 'priority_A_state_housing_applicants_as_pop_percent', 'priority_B_state_housing_applicants_as_pop_percent', 'debt_servicing_as_percent_of_disp_inc', 'house_median_multiples']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('Housing',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['national_house_purchasing_A30%_income_spend', 'national_house_purchasing_HAM', 'home_ownership_rate', 'debt_servicing_as_percent_of_disp_inc', 'house_median_multiples']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)## Group 1 from purchasing side
plt.ylabel('Value',fontsize=16)
plt.title('from house purchasing side',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
-> 'home_ownership_rate' decreases from 1992 to 2016, which is corelated to the 'house_median_multiples' which always increases. This means that the housing price increase is bigger than the income increase, so that less people will afford a housing. It is also somehow corelated to the 'national_house_purchasing_A30%_income_spend' which has overall increase trend.
-> 'national_house_purchasing_A30%_income_spend' is very corelated with 'debt_servicing_as_percent_of_disp_inc'. The higher percentage of households that spend above 30% of income on housing costs, the higher percentage they have debt servicing of nominal disposable income.
-> 'national_house_purchasing_HAM' is relative stable. That indicates that the first buyer's decision based on finanical status has nearly no big change.
df['priority_AandB']=df['priority_A_state_housing_applicants_as_pop_percent']+df['priority_B_state_housing_applicants_as_pop_percent']
C:\Users\lpxue\AppData\Local\Temp\ipykernel_13584\2667346656.py:1: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df['priority_AandB']=df['priority_A_state_housing_applicants_as_pop_percent']+df['priority_B_state_housing_applicants_as_pop_percent']
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['priority_AandB','national_house_rental_A30%_income_spend','national_house_purchasing_A30%_income_spend', 'national_house_rental_HAM', 'priority_A_state_housing_applicants_as_pop_percent', 'priority_B_state_housing_applicants_as_pop_percent']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('for renting and house applicants',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
-> 'priority_A_state_housing_applicants_as_pop_percent', 'priority_B_state_housing_applicants_as_pop_percent' have opppiste trend, as some priority B has changed into priority A when their situation became worse. Priority A plus Priority B is still stable with some derease in 2013 but then increases again.That means after 2013, more are at risk of housing. This is quite corelated with ,'national_house_purchasing_A30%_income_spend'. When more income is spending on the housing costs, then more Priority A+ Priority B.
Priority A refers to applicants who are considered at risk and includes households with a severe and persistent housing need that must be addressed immediately.
Priority B refers to applicants who have a serious housing need and includes households with a significant and persistent need.
-> 'national_house_rental_A30%_income_spend' and 'national_house_rental_HAM' is relative stable. So that means for the renting part, it is stable.
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['unemployment_rate', '60_64_years_unemp_rate', '65years_and_over_unemp_rate', 'underemployment_rate', 'employees_working_long_hours', 'labour_market_insecurity', 'long_term_unemployment_rate', 'pop_15_to_29_NEET', 'low_pay_percent_OECD_def', 'low_pay_percent_120percent_MW_def', 'living_min_wage_gap_magnitude', 'LIS_ms-16', 'LP:RPW_ms-16']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('Labour',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['unemployment_rate', '60_64_years_unemp_rate', '65years_and_over_unemp_rate', 'underemployment_rate', 'labour_market_insecurity', 'long_term_unemployment_rate', 'pop_15_to_29_NEET']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('unemployment',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
Long-term unemployment refers to people who have been unemployed for 12 months or more. The long-term unemployment rate shows the proportion of these long-term unemployed among all unemployed. Long-term unemployment causes significant mental and material stress for those affected and their families. It is also of particular concern for policy makers, as high rates of long-term unemployment indicate that labour markets are operating inefficiently. This indicator is measured as a percentage of unemployed. "https://data.oecd.org/unemp/long-term-unemployment-rate.htm"
Underemployment is a measure of the total number of people in an economy who are unwillingly working in low-skill and low-paying jobs or only part-time because they cannot get full-time jobs that use their skills. https://www.investopedia.com/terms/u/underemployment.asp#:~:text=The%20U.S.%20unemployment%20rate%20was%2013.3%25%20as%20of,is%20the%20sum%20of%20the%20employed%20and%20unemployed%29.%22
-> from above definition, we see big improvement of unemployment from 1993 to 2009, but from 2009 to 2018 the situation is not so good.
-> '60_64_years_unemp_rate' is corelated with the 'unemployment_rate'. '65years_and_over_unemp_rate' is stable and retirement is normal.
-> 'labour_market_insecurity', "underemployment_rateand" and 'pop_15_to_29_NEET' are very corelated with the 'unemployment_rate'. This is rational.
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['employees_working_long_hours', 'low_pay_percent_OECD_def', 'low_pay_percent_120percent_MW_def','living_min_wage_gap_magnitude', 'LIS_ms-16', 'LP:RPW_ms-16']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('employment',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
-> nearly no big findings from the three avriables.
-> Sudden increase of 'employees_working_long_hours' and decrease of 'low_pay_percent_OECD_def', opposite trend.
-> 'LIS_ms-16', 'LP:RPW_ms-16' are very stable. Stable industry and ratio of labour productivity (indexed) to real product wages (indexed) in MS-16 industries.
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['teriary_education_participation_rate', 'education_spend_as_percent_of_GDP', 'education_spend_as_percent_of_total_gov_expenses', 'median_teriary_loan_as_perc_of_med_inc_AHC', 'median_teriary_loan_balance_as_perc_of_med_inc_AHC', 'university_affordability_ratio_to_mean_salary', 'polytechnics_affordability_ratio_to_mean_salary', 'wananga_affordability_ratio_to_mean_salary', 'bachelors_earning_premium_hourly', 'dips_cert_earning_premium_hourly', 'school_earning_premium_hourly', 'bachelors_earning_premium_weekly', 'dips_cert_earning_premium_weekly', 'school_earning_premium_weekly']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('Education',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['teriary_education_participation_rate', 'education_spend_as_percent_of_GDP', 'education_spend_as_percent_of_total_gov_expenses', 'median_teriary_loan_as_perc_of_med_inc_AHC', 'median_teriary_loan_balance_as_perc_of_med_inc_AHC', 'university_affordability_ratio_to_mean_salary', 'polytechnics_affordability_ratio_to_mean_salary', 'wananga_affordability_ratio_to_mean_salary']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('education investment and cost',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
-> 'education_spend_as_percent_of_GDP', 'education_spend_as_percent_of_total_gov_expenses' show the government investment on education. Stable percentage of GDP but overall increasing percentage in total government expenses. Shows that the government shows high attention on education.
-> 'university_affordability_ratio_to_mean_salary' is relative stable with small increase. 'polytechnics_affordability_ratio_to_mean_salary' and 'wananga_affordability_ratio_to_mean_salary' decreased bigly from 2001to 2004 and then became relative stable.
-> 'median_teriary_loan_as_perc_of_med_inc_AHC' and 'median_teriary_loan_balance_as_perc_of_med_inc_AHC' has big increase from 2011. Which shows students has higher loaning although the cost is relative stable. So thatn means the family may spend more money on housing, so they have less money on education and have to loan.
-> 'teriary_education_participation_rate' increased before 2006 and decreased after 2006.
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['bachelors_earning_premium_hourly', 'dips_cert_earning_premium_hourly', 'school_earning_premium_hourly', 'bachelors_earning_premium_weekly', 'dips_cert_earning_premium_weekly', 'school_earning_premium_weekly']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('earnings of different study degrees',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
-> in general it shows the education is quite related to the income/earning.
-> the earning of school qualification is a little higher than the no qualification. The diploma and master's earnings are much higher than the no qualification. So the education is important for earnings.
-> seprately show the 'health_expenditure_per_capita_PPP', as the number is much bigger than other variables.
selected_columns = ['health_expenditure_per_capita_PPP']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('health_expenditure_per_capita_PPP',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
selected_columns = ['health_expenditure_as_percent_of_gdp']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('health_expenditure_as_percent_of_gdp',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
-> from 2001 to 2016, nearly double. Big increase of the expenditure per capita, and big increase of pecentage in GDP, but not doubled.
selected_columns = ['depression_adult', 'excellent_very_good_or_good_self_rated_health', 'psychological_distress_adult', 'mood_anxiety_disorder_adult', 'healthy_weight_adult', 'unmet_after_hours_care_due_to_cost_adult', 'unmet_GP_care_due_to_cost_adult', 'veg_and_fruit_intake_adult']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('adult health',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
-> 'depression_adult', 'psychological_distress_adult' and 'mood_anxiety_disorder_adult' all increased slowly from 2013 to 2018;
-> 'healthy_weight_adult' and 'veg_and_fruit_intake_adult' both decreased from 2013 to 2018. The two avriables are corelated. Vegetable and fruit intake is related to health weight.
-> 'unmet_after_hours_care_due_to_cost_adult' and 'unmet_GP_care_due_to_cost_adult' are relative stable.
-> 'excellent_very_good_or_good_self_rated_health' has high rate. This is strange, as the 'depression_adult' rate is high.
df.loc[23, 'unmet_GP_care_due_to_cost_adult'] = np.nan
selected_columns = ['depression_adult', 'excellent_very_good_or_good_self_rated_health', 'psychological_distress_adult', 'mood_anxiety_disorder_adult', 'healthy_weight_adult', 'unmet_after_hours_care_due_to_cost_adult', 'unmet_GP_care_due_to_cost_adult', 'veg_and_fruit_intake_adult']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('adult health',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
selected_columns = [ 'vegetable_and_fruit_intake_child', 'healthy_weight_child','excellent_very_good_or_good_parent_rated_health_child']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('children health: vegetable, weight, health',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
selected_columns = ['breakfast_eaten_at_home_less_5_days_a_week_child', 'emotional_behavioural_problems_child', 'adult_diabetes_rates', 'depression_child', 'unfilled_prescription_due_to_cost_child', 'unmet_need_for_after_hours_due_to_cost_child', 'unmet_need_for_GP_due_to_cost_child']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('childeren health',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
-> 'vegetable_and_fruit_intake_child', 'healthy_weight_child' are related. 'vegetable_and_fruit_intake_child' decreased from 2016 to 2018. 'excellent_very_good_or_good_parent_rated_health_child' is high, but actually the parents should care that their children should take more vegetables and fruits.
-> both 'breakfast_eaten_at_home_less_5_days_a_week_child' and 'emotional_behavioural_problems_child' increased slowly from 2016 to 2018; seems that the parents should care more about the children from food to emotion.
-> 'unfilled_prescription_due_to_cost_child', 'unmet_need_for_after_hours_due_to_cost_child', 'unmet_need_for_GP_due_to_cost_child' slowly decreased from 2016 to 2018. That means more support from the government for children are needed.
-> 'depression_child' is relative stable and slowly decreased from 2016 to 2018. Is this real? maybe some cases are not detected because of unfilled prescription etc.
selected_columns = ['suicides_per_100000', 'problem_gambling_intervention_prevelance_percent']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('suicide and gambling percentage',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['total_prisoners_in_remand_rate', 'total_sentenced_prisoners_rate', 'total_post_sentence_offender_population_rate']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('prisoners in remand, sentenced and post sentenced',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
-> 'total_prisoners_in_remand_rate' and 'total_post_sentence_offender_population_rate' have similar trend. So we can also see the 'total_sentenced_prisoners_rate' is relative stable in between. -> overall we see the 'total_prisoners_in_remand_rate' increase, that means the overall saftey and security decreased, especially from 2015 to 2018.
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['violent_crime_victimisations_rate', 'recorded_murders_and_homicides_per_million']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('violance and murder',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['regional_gdp_proportional_variation', 'difference_in_percent_for_low_income_by_gender', 'gender_pay_gap']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('General inequality',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
df.loc[19, 'gender_pay_gap'] = np.nan
df.loc[10, 'difference_in_percent_for_low_income_by_gender'] = np.nan
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['regional_gdp_proportional_variation', 'difference_in_percent_for_low_income_by_gender', 'gender_pay_gap']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('General inequality',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
-> Proportional variation is good, and 'gender_pay_gap' is becoming even smaller. These are good indicates. -> In EDA and data visualisation we primarily grouped and checked the data details to get primary overall findings; we correct some data at the same time.
Although for each group there are many variables, through the sub-group of these variables of each group we can find the corelation between them for better understanding this group status and trends. In EDA and data visualisation we alreday got primary impression through sub-grouping. Based on the primary findings, in data analysis I believe that we can find more clear corelation of variables in same group through detailed corelation analysis of each group, and this will help us to focus on some key variables to simplify the analysis. This will also help us to understand each group deeper.
In EDA and data visualisation, we already see some relationship between the eight different groups. So when we select the representative variable of each group, we can furhter study the corelation of each group. This will help us to understand the reasons for changes of each group, to see how they influence each other. General assumptions are as below:
1.Labour is a very important indicator of economic status/level of a country and influences many other parts. New Zealand is a relative stable market with constent policy, and it is not a big country without mature or complete supply chain for most industries, so big changes of labour is mainly caused by external influences such like global financial crisis 2007-2009, Covid etc. Labour status also influences the gender inequality;
2.Besides the external influence, education is also quite related to labour, and of course also realted to income. The eduction system prepares and trains the future labor force;
3.People get income through labour, so labour status decides the income and long termly also decides the wealth;
4.If the labour staus is good, people would like to spend more money on housing and will have more money for health.
5.If people spend much money on housing and health but can not afford them with good labour, this will leads to poverty;
6.Poverty will leads to safety and security issues. Safety and security may also cause the poverty;
In data analysis we will check and adjust the assumption based on new and deeper data analysis. In the end we will try to provide related recommendations.
General flow chart based on above understanding, the corelation will be justified and adjusted through detailed data analysis in next step.
from IPython.display import Image
Image(filename="D:/a.jpg", width=800, height=400)
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['unemployment_rate', '60_64_years_unemp_rate', '65years_and_over_unemp_rate', 'underemployment_rate', 'labour_market_insecurity', 'long_term_unemployment_rate', 'pop_15_to_29_NEET']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('Labour of Unemployment',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
columns_needed = ['unemployment_rate', '60_64_years_unemp_rate', '65years_and_over_unemp_rate', 'underemployment_rate', 'labour_market_insecurity', 'long_term_unemployment_rate', 'pop_15_to_29_NEET']
sub_data = df[columns_needed]
correlation_matrix = sub_data.corr()
correlation_matrix
| unemployment_rate | 60_64_years_unemp_rate | 65years_and_over_unemp_rate | underemployment_rate | labour_market_insecurity | long_term_unemployment_rate | pop_15_to_29_NEET | |
|---|---|---|---|---|---|---|---|
| unemployment_rate | 1.000000 | 0.298467 | 0.383727 | 0.806789 | 0.991128 | 0.839564 | 0.835874 |
| 60_64_years_unemp_rate | 0.298467 | 1.000000 | -0.164078 | 0.819555 | 0.851800 | 0.335302 | 0.605242 |
| 65years_and_over_unemp_rate | 0.383727 | -0.164078 | 1.000000 | -0.147069 | 0.820517 | 0.243247 | 0.127590 |
| underemployment_rate | 0.806789 | 0.819555 | -0.147069 | 1.000000 | 0.785735 | 0.573200 | 0.667378 |
| labour_market_insecurity | 0.991128 | 0.851800 | 0.820517 | 0.785735 | 1.000000 | 0.450770 | 0.796450 |
| long_term_unemployment_rate | 0.839564 | 0.335302 | 0.243247 | 0.573200 | 0.450770 | 1.000000 | -0.002802 |
| pop_15_to_29_NEET | 0.835874 | 0.605242 | 0.127590 | 0.667378 | 0.796450 | -0.002802 | 1.000000 |
Unemployment was continously decreased from 1993 to 2008, but increased from 2008 mainly caused by global financial crisis in 2008, an then decreased again from 2013. When financial crisis happened, the young group is most influenced.
-> "unemployment rate" and "long term unemployment rate" are two most important indicators. Both decreased from 1993/1994 and increased from 2008/2009. Because the "long-term unemployment" refers to people who have been unemployed for 12 months or more, we see the peak and turning point is one year later compared to "unemployment", but still with 0.84 corelation;
-> elderly group "60-65" umemployment is closely related to overall "unemployment rate", while "over 65" is not that senstive and relative stable as most of them are retired; corelation to "unemployment rate" is relatvie low with around 0.3;
-> young group "15-29 years" also follow the trend of "unemployment rate" closely with 0.84 corelation, and even more senstive in 2008 (sudden big increase);
-> "labour market insecurity" exactly followed the trend of "unemployment rate" with 0.99 corelation. This is reasonable. "underemployment rate" is relative stable when "unemployment rate" changes, as they only work as part time job, with 0.81 corelation.
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['employees_working_long_hours', 'low_pay_percent_OECD_def', 'low_pay_percent_120percent_MW_def','living_min_wage_gap_magnitude', 'LIS_ms-16', 'LP:RPW_ms-16']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('Labour of Employment',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
from scipy.stats import linregress
non_empty_values = df['employees_working_long_hours'].dropna().tolist()
x = np.arange(len(non_empty_values))
y = np.array(non_empty_values)
slope, intercept, r_value, p_value, std_err = linregress(x, y)
plt.plot(x, y, color='red', label='Data')
plt.plot(x, slope * x + intercept, color='blue')
plt.title('Employees with long working hours_Regression line',fontsize=18)
Text(0.5, 1.0, 'Employees with long working hours_Regression line')
mean1=df['low_pay_percent_OECD_def'].mean()
mean1
13.030602276190477
Internal structure in the "employment" is relative stable even "unemployment rate" changes rapidly.
-> according to above regression line we can see that the "employees with long working hours " overall decresed, which means the working condition has been improved. "low pay percentage OECD" is relative stable with result around 13;
->"low pay pecentage compared to 120%" does not have many data, so will not analyze it
-> "Labour share of income in MS-16 industries" is around 55%. This means ths MS-16 industries contribute more than half of the labour and is very stable. Same stable as "Ratio of labour productivity (indexed) to real product wages (indexed) in MS-16 industries."
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['bachelors_earning_premium_hourly', 'dips_cert_earning_premium_hourly', 'school_earning_premium_hourly', 'bachelors_earning_premium_weekly', 'dips_cert_earning_premium_weekly', 'school_earning_premium_weekly']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('Earnings with different education',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
columns = ['bachelors_earning_premium_hourly', 'dips_cert_earning_premium_hourly', 'school_earning_premium_hourly']
mean1= df[columns].mean()
mean1
bachelors_earning_premium_hourly 62.390785 dips_cert_earning_premium_hourly 29.214599 school_earning_premium_hourly 4.456910 dtype: float64
-> Education degree is clearly related to earning. Behind the earning is the Labour. Higher education will have good job and will have high income.
-> for hourly earnings, bachelor can earn on average 62 NZD and diploma/certificate earn on average 29 NZD, while school qualification only earn on average 4.5 NZD, which means many of them do not have a job.
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['teriary_education_participation_rate','education_spend_as_percent_of_GDP', 'education_spend_as_percent_of_total_gov_expenses']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('Education investment and affordability',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
columns_needed = ['teriary_education_participation_rate','education_spend_as_percent_of_GDP', 'education_spend_as_percent_of_total_gov_expenses']
sub_data = df[columns_needed]
correlation_matrix = sub_data.corr()
correlation_matrix
| teriary_education_participation_rate | education_spend_as_percent_of_GDP | education_spend_as_percent_of_total_gov_expenses | |
|---|---|---|---|
| teriary_education_participation_rate | 1.000000 | 0.475257 | 0.668082 |
| education_spend_as_percent_of_GDP | 0.475257 | 1.000000 | 0.718629 |
| education_spend_as_percent_of_total_gov_expenses | 0.668082 | 0.718629 | 1.000000 |
-> Goverment investment on educaiton reached peak in year 2007 and then dreased in 2007-2009, which can also be related with financial crisis. Meanwhile the teriary education participation also decreased continously from 2006.
-> "education expenditure as percentage of total government expenses" and "Education expenditure as a percentage of GDP" are middle corelated to "teriary education participation" (0.4â€|r|<0.7), so the goverment investment on education is important for teriary education participation. To prepare the high level labour.
df['year'] = pd.to_datetime(df['year'])
selected_columns = [ 'teriary_education_participation_rate','median_teriary_loan_as_perc_of_med_inc_AHC', 'median_teriary_loan_balance_as_perc_of_med_inc_AHC']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('Teriary loan',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['university_affordability_ratio_to_mean_salary', 'polytechnics_affordability_ratio_to_mean_salary', 'wananga_affordability_ratio_to_mean_salary']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('Education fee compared to mean earning',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
From 2010 the loan compared to income is increasing quickly. Average university fee is always increasing after 2004. This further influnces the the teriary education participation, which decreased continously from 2006.
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['unemployment_rate','teriary_education_participation_rate']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('Education fee compared to mean earning',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
df = df.reset_index(drop=True)
new_df = df.loc[12:34, ['teriary_education_participation_rate','unemployment_rate']]
correlation_matrix = new_df.corr()
correlation_matrix
| teriary_education_participation_rate | unemployment_rate | |
|---|---|---|
| teriary_education_participation_rate | 1.000000 | -0.838318 |
| unemployment_rate | -0.838318 | 1.000000 |
correlation_matrix = new_df.corr()
correlation_matrix
| teriary_education_participation_rate | unemployment_rate | |
|---|---|---|
| teriary_education_participation_rate | 1.000000 | -0.838318 |
| unemployment_rate | -0.838318 | 1.000000 |
From 1999 to 2013, when the "teriary education participation' increases, the "unemployment rate" decreases. And when the "teriary education participation' decreases, the "unemployment rate" increases. The turning point is 2008 financial crisis, which caused to less employment and less investment in education. Unemployment rate decrease from 2013 may have other reasons (maybe investement in industry or others)
overall the New Zealand government increased the investment on education, and it was influenced by financial crisis 2007-2009. But the loan percentage from 2010 and also the university fee from 2004 are always increasing, so goverment needs to provide more support for education espcially for university education. To provide support for education cost, if they want to increase the teriary education participation. And the the teriary education participation can provide high quality labour and increase the labour income.
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['regional_gdp_proportional_variation', 'difference_in_percent_for_low_income_by_gender', 'gender_pay_gap']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('General inequality',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['gender_pay_gap','unemployment_rate']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('General inequality',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
df = df.reset_index(drop=True)
new_df = df.loc[16:34, ['unemployment_rate','gender_pay_gap']]
correlation_matrix = new_df.corr()
correlation_matrix
| unemployment_rate | gender_pay_gap | |
|---|---|---|
| unemployment_rate | 1.000000 | 0.171208 |
| gender_pay_gap | 0.171208 | 1.000000 |
"gender pay gap" also follows the "unemployment rate" trend. When the "unemployment rate" changes, the "gender pay gap"also changes a few years later. When the unemployment rate decreases that means more labour is needed, so the woman can also get a good pay. In difficult time when the umemployment rate increases, the gender pay gap will be bigger after a few years.
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['Q5:Q1', 'D10:D1', 'D10:D1-4(Palma)','P90:P10_bhc', 'P80:P20_bhc','GINI-BHC']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('Income and wealth',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
columns_needed = ['GINI-BHC','Q5:Q1', 'D10:D1', 'D10:D1-4(Palma)','P90:P10_bhc', 'P80:P20_bhc']
sub_data = df[columns_needed]
correlation_matrix = sub_data.corr()
correlation_matrix
| GINI-BHC | Q5:Q1 | D10:D1 | D10:D1-4(Palma) | P90:P10_bhc | P80:P20_bhc | |
|---|---|---|---|---|---|---|
| GINI-BHC | 1.000000 | 0.977133 | 0.944154 | 0.993647 | 0.920902 | 0.933327 |
| Q5:Q1 | 0.977133 | 1.000000 | 0.980766 | 0.974091 | 0.935184 | 0.908565 |
| D10:D1 | 0.944154 | 0.980766 | 1.000000 | 0.952201 | 0.889478 | 0.871680 |
| D10:D1-4(Palma) | 0.993647 | 0.974091 | 0.952201 | 1.000000 | 0.860994 | 0.878936 |
| P90:P10_bhc | 0.920902 | 0.935184 | 0.889478 | 0.860994 | 1.000000 | 0.929279 |
| P80:P20_bhc | 0.933327 | 0.908565 | 0.871680 | 0.878936 | 0.929279 | 1.000000 |
Income disbribution is very corelated with GINI, so will only take GINI for further analysis.
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['GINI-BHC','unemployment_rate']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('GINI and unemployment rate',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
columns_needed = ['GINI-BHC','unemployment_rate']
sub_data = df[columns_needed]
correlation_matrix = sub_data.corr()
correlation_matrix
| GINI-BHC | unemployment_rate | |
|---|---|---|
| GINI-BHC | 1.00000 | -0.04316 |
| unemployment_rate | -0.04316 | 1.00000 |
df = df.reset_index(drop=True)
new_df = df.loc[25:34, ['GINI-BHC', 'unemployment_rate']]
correlation_matrix = new_df.corr()
correlation_matrix
| GINI-BHC | unemployment_rate | |
|---|---|---|
| GINI-BHC | 1.000000 | 0.190525 |
| unemployment_rate | 0.190525 | 1.000000 |
If we take all the years' data for analysis, we found that GINI has nearly no corelation with unemployment rate. If we take the data from 2008-2017 for analysis (GINI is lacked of data before 2008), we will find that they are weakly corelated with 0.2; So main Labour indicator (unemployment rate) seems weakly influnce the main income distribution indicator (GINI); The main reason could be that the GINI is sitll in good status under 0.4 (Gini coefficient between 0.3 and 0.4 is considered moderate inequality) and the unemployment also takes time to influence the GINI.
df['year'] = pd.to_datetime(df['year'])
selected_columns = [ 'home_ownership_rate', 'house_median_multiples']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('Housing',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
columns_needed = [ 'home_ownership_rate', 'house_median_multiples']
sub_data = df[columns_needed]
correlation_matrix = sub_data.corr()
correlation_matrix
| home_ownership_rate | house_median_multiples | |
|---|---|---|
| home_ownership_rate | 1.000000 | -0.893324 |
| house_median_multiples | -0.893324 | 1.000000 |
'home ownership rate' and 'house median multiples' are very negative correlated (-0.89). This is reasonable. When people need to work more years to afford the housing purchasing price, the less will purchase or own a house.
df['year'] = pd.to_datetime(df['year'])
selected_columns = [ 'unemployment_rate','national_house_purchasing_A30%_income_spend', 'debt_servicing_as_percent_of_disp_inc']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('Unemployment, house purchasing spendure and debt',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
df = df.reset_index(drop=True)
new_df = df.loc[21:35, ['unemployment_rate','national_house_purchasing_A30%_income_spend', 'debt_servicing_as_percent_of_disp_inc']]
correlation_matrix = new_df.corr()
correlation_matrix
| unemployment_rate | national_house_purchasing_A30%_income_spend | debt_servicing_as_percent_of_disp_inc | |
|---|---|---|---|
| unemployment_rate | 1.000000 | -0.161021 | -0.680698 |
| national_house_purchasing_A30%_income_spend | -0.161021 | 1.000000 | 0.485398 |
| debt_servicing_as_percent_of_disp_inc | -0.680698 | 0.485398 | 1.000000 |
"Percentage of households spending above 30% of income on housing" increase bigly before 2008 and decrease bigly after 2008. Obviously this was caused by the financial crisis. "Unemployment rate" is middle corelated with "Percentage of disposable income spent on household debt". When unemployment decreased, people are optimistic and will have higher loan to purchase the housing.
selected_columns = ['health_expenditure_per_capita_PPP']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('health expenditure per capita PPP',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
selected_columns = ['health_expenditure_as_percent_of_gdp']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('health expenditure as percent of gdp',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
from 2001 to 2016, nearly double of the expenditure per capita, and big increase of pecentage in GDP;
selected_columns = ['depression_adult', 'psychological_distress_adult', 'mood_anxiety_disorder_adult', 'healthy_weight_adult', 'veg_and_fruit_intake_adult','adult_diabetes_rates']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('adult health',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
df = df.reset_index(drop=True)
new_df = df.loc[30:35, [ 'veg_and_fruit_intake_adult','healthy_weight_adult', 'depression_adult', 'psychological_distress_adult', 'mood_anxiety_disorder_adult', 'adult_diabetes_rates']]
correlation_matrix = new_df.corr()
correlation_matrix
| veg_and_fruit_intake_adult | healthy_weight_adult | depression_adult | psychological_distress_adult | mood_anxiety_disorder_adult | adult_diabetes_rates | |
|---|---|---|---|---|---|---|
| veg_and_fruit_intake_adult | 1.000000 | 0.936210 | -0.808740 | -0.913669 | -0.895202 | -0.257291 |
| healthy_weight_adult | 0.936210 | 1.000000 | -0.803705 | -0.911935 | -0.875513 | -0.215869 |
| depression_adult | -0.808740 | -0.803705 | 1.000000 | 0.834509 | 0.956041 | -0.308078 |
| psychological_distress_adult | -0.913669 | -0.911935 | 0.834509 | 1.000000 | 0.844995 | 0.193556 |
| mood_anxiety_disorder_adult | -0.895202 | -0.875513 | 0.956041 | 0.844995 | 1.000000 | -0.192226 |
| adult_diabetes_rates | -0.257291 | -0.215869 | -0.308078 | 0.193556 | -0.192226 | 1.000000 |
"vegetable and fruit intake" and "healthy weight" are quite corelated, and highly negatively corelated with "depression" , "psychological distress" and "mode of anxiety disorder" . So that means for adult, less vegetable and fruit intake leads to bigger weight and related psychological health problems, but not so much to "adult diabetes";
selected_columns = [ 'vegetable_and_fruit_intake_child', 'healthy_weight_child','excellent_very_good_or_good_parent_rated_health_child','breakfast_eaten_at_home_less_5_days_a_week_child', 'emotional_behavioural_problems_child', 'depression_child']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('Children health',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
df = df.reset_index(drop=True)
new_df = df.loc[30:35, [ 'vegetable_and_fruit_intake_child', 'healthy_weight_child','breakfast_eaten_at_home_less_5_days_a_week_child', 'emotional_behavioural_problems_child', 'depression_child']]
correlation_matrix = new_df.corr()
correlation_matrix
| vegetable_and_fruit_intake_child | healthy_weight_child | breakfast_eaten_at_home_less_5_days_a_week_child | emotional_behavioural_problems_child | depression_child | |
|---|---|---|---|---|---|
| vegetable_and_fruit_intake_child | 1.000000 | 0.215014 | -0.874686 | -0.643957 | 0.367604 |
| healthy_weight_child | 0.215014 | 1.000000 | 0.239368 | -0.667914 | -0.545509 |
| breakfast_eaten_at_home_less_5_days_a_week_child | -0.874686 | 0.239368 | 1.000000 | 0.400366 | -0.528145 |
| emotional_behavioural_problems_child | -0.643957 | -0.667914 | 0.400366 | 1.000000 | 0.024971 |
| depression_child | 0.367604 | -0.545509 | -0.528145 | 0.024971 | 1.000000 |
"vegetable and fruit intake" and "breakfast eaten at home less than 5 days a week" are quite corelated, and highly negatively corelated with "emotional behavioural problems" . That means if children who eat less fruit or vegetable will fewer eat breakfast at home and this could leads to emotional behavioural problems. Different from adult, children who eat less fruit or vegetable is not so related to depression or health weight. Maybe children are not so rely on healthy food compared to adults.
selected_columns = ['unmet_after_hours_care_due_to_cost_adult', 'unmet_GP_care_due_to_cost_adult','unfilled_prescription_due_to_cost_child', 'unmet_need_for_after_hours_due_to_cost_child', 'unmet_need_for_GP_due_to_cost_child']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('unmet health care',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
compared to adult, health support for children is improving.
selected_columns = ['depression_adult','long_term_unemployment_rate']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('adult health',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
df = df.reset_index(drop=True)
new_df = df.loc[30:35, [ 'depression_adult','long_term_unemployment_rate']]
correlation_matrix = new_df.corr()
correlation_matrix
| depression_adult | long_term_unemployment_rate | |
|---|---|---|
| depression_adult | 1.000000 | 0.904932 |
| long_term_unemployment_rate | 0.904932 | 1.000000 |
-> Long-term unemployment refers to people who have been unemployed for 12 months or more. The long-term unemployment rate shows the proportion of these long-term unemployed among all unemployed. Long-term unemployment causes significant mental and material stress for those affected and their families. It is also of particular concern for policy makers, as high rates of long-term unemployment indicate that labour markets are operating inefficiently. This indicator is measured as a percentage of unemployed. "https://data.oecd.org/unemp/long-term-unemployment-rate.htm"
-> vegetable and fruit intake is much more important for adults than children, which can help control the weight and reduce the pyschological problems.
-> vegetable, fruit and breakfast is also important for children, maybe not that important as for adults.
-> adults depression rate is around 15% and still increasing. "long term unemployment rate" is quite related to this. So we should care much more of the psychological status of long term unemployed people, and provide more health support to this group.
df['year'] = pd.to_datetime(df['year'])
selected_columns = [ 'LIH_B50_CV_AHC', 'LIH_B60_CV_AHC',"LIEH_B50_CV_median_AHC","LIEH_B60_CV_median_AHC","child_poverty_ahc_perc_HH_B50_CV","child_poverty_ahc_perc_HH_B60_CV"]
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('Poverty compared to 2007',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
df['year'] = pd.to_datetime(df['year'])
selected_columns = [ 'LIH_B50_contemp_median_AHC', 'LIH_B60_contemp_median_AHC',"LIEH_B50_contemp_median_AHC","LIEH_B60_contemp_median_AHC","child_poverty_ahc_perc_HH_B50_contemp_median","child_poverty_ahc_perc_HH_B60_contemp_median"]
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('Poverty compared to contemporary median',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
df['year'] = pd.to_datetime(df['year'])
selected_columns = [ 'LIS65H_B50_CV_median_AHC', 'LIS65H_B60_CV_median_AHC',"LISPH_B50_CV_median_AHC","LISPH_B60_CV_median_AHC"]
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('Poverty risk ratio for Single Under 65 and for solo parent',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
-> either we compare to 2007 median income or compare to contemporary median, we find that the elderly group's situation is best, then is the general households, the worst situation is the families with children. So more support for children and families with children is needed.
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['rate_of_personal_insolvencies']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('personal insolvencies',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['loan_delinquency_percent','unemployment_rate']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('Unemployment and loan delinquency percentage',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
df = df.reset_index(drop=True)
new_df = df.loc[25:34, ['unemployment_rate','loan_delinquency_percent','rate_of_personal_insolvencies']]
correlation_matrix = new_df.corr()
correlation_matrix
| unemployment_rate | loan_delinquency_percent | rate_of_personal_insolvencies | |
|---|---|---|---|
| unemployment_rate | 1.000000 | 0.256994 | 0.007644 |
| loan_delinquency_percent | 0.256994 | 1.000000 | 0.867414 |
| rate_of_personal_insolvencies | 0.007644 | 0.867414 | 1.000000 |
df = df.reset_index(drop=True)
new_df = df.loc[25:34, ['loan_delinquency_percent','rate_of_personal_insolvencies']]
correlation_matrix = new_df.corr()
correlation_matrix
| loan_delinquency_percent | rate_of_personal_insolvencies | |
|---|---|---|
| loan_delinquency_percent | 1.000000 | 0.867414 |
| rate_of_personal_insolvencies | 0.867414 | 1.000000 |
-> obviousely there is sudden big increase of "loan delinquency percentage" and "personal insolvencies" in 2008, which is mainly caused by financial crisis in 2008. There is some corelation between "unemployment rate" and "loan delinquency percentage". The "loan delinquency percentage" and "personal insolvencies" are quite corelated.
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['total_prisoners_in_remand_rate', 'total_sentenced_prisoners_rate', 'total_post_sentence_offender_population_rate']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('Safety and security',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['violent_crime_victimisations_rate', 'recorded_murders_and_homicides_per_million']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('Safety and security',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
df['year'] = pd.to_datetime(df['year'])
selected_columns = ['rate_of_personal_insolvencies','total_prisoners_in_remand_rate']
for column in selected_columns:
plt.plot(df['year'], df[column], marker='o', linestyle='-', label=column)
plt.xlabel('Year',fontsize=16)
plt.ylabel('Value',fontsize=16)
plt.title('rate of personal insolvencies and total prisoners in remand',fontsize=18)
plt.legend()
plt.grid(True)
plt.show()
df = df.reset_index(drop=True)
new_df = df.loc[25:34, ['total_prisoners_in_remand_rate','rate_of_personal_insolvencies','loan_delinquency_percent']]
correlation_matrix = new_df.corr()
correlation_matrix
| total_prisoners_in_remand_rate | rate_of_personal_insolvencies | loan_delinquency_percent | |
|---|---|---|---|
| total_prisoners_in_remand_rate | 1.000000 | 0.588088 | 0.577912 |
| rate_of_personal_insolvencies | 0.588088 | 1.000000 | 0.867414 |
| loan_delinquency_percent | 0.577912 | 0.867414 | 1.000000 |
-> obviousely there is an overall increasing trend of "total prisoners in remand rate" which shows overall safety is decreasing. But the serious crime such like "violent_crime" and "murder and and homicides" is decreasing.
-> "total prisoners in remand rate" is moderate corelated with "rate of personal_insolvencies" and "loan delinquency percent". So the poverty is moderate corelated with safety and security.
-> For the 8 groups, tried to sub-group the variables to understand each group more logically and clearly for valuable findings. Meanwhile also tried to provide recommendations to the policy markers based on the findings. More details please see "Key findings for each group" below.
-> Also tried to find the corelation of "Labor" with other 7 groups, as "Labor" is a very key indicator which is ofen used to indicate the economic status of one country, and it also influences many other indicators such like "income, housing, poverty" etc. For each group we find one or two representative variables for corelation analysis. This is not a perfect way, but it is interesting to find some potential relationships between them. This helps us to understand the reasons for changes of each group variables. So each group is not independent and influences each other.
-> Group 1 "Labour": Unemployment was continously decreased from 1993 to 2008, but increased from 2008 which was mainly caused by global financial crisis in 2008, and then decreased again from 2013. When financial crisis happened, the labour of young group (15-29 years old) is most influenced. "Labour share of income in MS-16 industries" is around 55%, which is more than half.
-> Group 2 "Education": It is very clear that education degree is closely related to earnings/income. Behind the earning/income is the "Labour" level. The teriary education participation rate continously increased before 2006, along with the continous decrease of "unemployment rate". This is a very good corelation, but from 2007-2008 both of their good trends have been changed. The "unemployment rate" decreased again from 2013, but the "teriary education participation rate" keeps decreasing without any change. The even higher education fee especially for university study and accordingly the even increasing teriary loan can be the main reason, although the goverment budget for eduction is also increasing (but the increasing is not that big compared to teriary loan and univeristy fee increase). So more support for the teriary education is needed: either more budget for teriary education to reduce the study fee or better policy for teriary loan etc. The teriary education participation rate will also contribute to the "Labour" and consequentially to othe parts such like "income, housing....", so high level education is very important and normally high degree leads to higher earnings.
-> Group 3 "General inequality": "gender pay gap" is a key indicator in this group. It follows the "unemployment rate" trend. When the "unemployment rate" changes, the "gender pay gap"also changes a few years later. When the "unemployment rate" decreases that means more labour is needed, so the woman can also get a good pay. In difficult time when the "umemployment rate" increases, the gender pay gap will become bigger a few years later. Of course "gender pay gap" is also influenced by others such like society development, social recognition of gender equality etc.
-> Group 4 "Income and Wealth": Income disbribution is very corelated with GINI, so will only take GINI for further analysis. If we take all the years' data for analysis, we found that GINI has nearly no corelation with "unemployment rate". If we take the data from 2008-2017 for analysis (not enough data for GINI before 2008), we will find that they are weakly corelated with 0.2. So main indicator for "Labour" (unemployment rate) will weakly influnce the main "Income and Wealth" indicator (GINI); The main reason could be that the GINI is sitll in good status under 0.4 (Gini coefficient between 0.3 and 0.4 is considered moderate inequality) and the "unemployment rate" may also take time to influence the GINI . Even during difficult time when "unemployment rate" is increasing, GINI does not change much. In general we see GINI is sligtly increasing along from 1982 with 0.27 to 2018 with 0.34. So more investigation and data is needed to find the reasons behind it.
-> Group 5 "Housing": 'home ownership rate' and 'house median multiples' are very negative correlated (-0.89). This is reasonable. When people need to work more years to afford the housing purchasing price, less people will purchase or own a house."Percentage of households spending above 30% of income on housing" increase bigly before 2008 and decrease bigly after 2008. Obviously this change was caused by the financial crisis. "Unemployment rate" is middle corelated with "Percentage of disposable income spent on household debt". When unemployment decreased, people are optimistic and are willing to afford higher loan to purchase the housing.
-> Group 6 "Health": From 2001 to 2016, nearly double of the health expenditure per capita and big increase of Health expenditure as a percentage of GDP; For adult "vegetable and fruit intake" and "healthy weight" are quite corelated, and highly negatively corelated with "depression" , "psychological distress" and "mode of anxiety disorder" . So that means for adult, less vegetable and fruit intake leads to bigger weight and related psychological health problems; "depression" is good corelated with "long-term unemployment rate". "long-term unemployment rate" refers to people who have been unemployed for 12 months or more and causes significant mental and material stress for those affected and their families (https://data.oecd.org/unemp/long-term-unemployment-rate.htm"). For children "vegetable and fruit intake" and "breakfast eaten at home less than 5 days a week" are quite corelated, and highly negatively corelated with "emotional behavioural problems". That means if children who eat less fruit or vegetable will fewer eat breakfast at home and this could leads to emotional behavioural problems. Different from adult, children who eat less fruit or vegetable is not so related to depression or health weight. Maybe children do not so rely on healthy food compared to adults. Compared to adult, health support for children is improving. So in general, we should encourage children and especially adults to eat more vegetables and fruits for physical and pyschological health, as both of them eat less now. Adults depression rate is around 15% and still increasing. "long term unemployment rate" is quite related to this. So we should care much more of the psychological status of long term unemployed people, and provide more health support to this group.
-> Group 7 "Poverty": Either we compare to 2007 median income or compare to contemporary median, we find that the elderly group's situation is best, then is the general households, the worst situation is the families with children. So more support for children and families with children is needed. Obviousely there is sudden big increase of "loan delinquency percentage" and "personal insolvencies" in 2008, which is mainly caused by financial crisis in 2008. There is some corelation between "unemployment rate" and "loan delinquency percentage", that means "Labour" status is also related to "Poverty".
-> Group 8 "Safety and security": Obviousely there is an overall increasing trend of "total prisoners in remand rate" which shows overall safety is decreasing. But the serious crime such like "violent_crime" and "murder and and homicides" is decreasing. "total prisoners in remand rate" is moderate corelated with "rate of personal insolvencies" and "loan delinquency percent". So "Safety and security" is moderate corelated with "Poverty". It is also influenced by many other sides such like wealthfare system, police system and education etc. , Poverty is just one reason of it.
from IPython.display import Image
Image(filename="D:/b.jpg", width=800, height=400)
-> we just chosed one or two key representative avriables of each group. One reason is to simplify the analysis, another reason is that different variable has different quantity of data. Simplily interpolation to get more data to compare all the variables at the same time will decrease the accuracy, as some variables have only very limited data.
-> we mainly analyze the corelation with "Labour", as "Labour" is very important indicator and directly influence other groups of variables. Of course there should also be corelation between other groups: "Income and wealth" with "Housing" and "Poverty" etc.
-> as "Income and wealth" here mainly talks about income distribution such like GINI, the corelation with "Labour" is weak. But if we talk about the exact income level, "Labour" should be quite related.
-> if we have more data or more variables, such like exact income/salary etc., it will help to find the corelation, but because of time reason, did not collect extra data for deeper analysis. The trail for corelation is also an interesting journey, alough that will not be easy and the result still needs to be further investigated.
1.SuĆĄnjak, T., Schumacher, C., Ali, A., Brook, A., Geertsema, P., Matthewson, J., Owens, R., & Smith, J. (2019b). Towards a global index of shared prosperity: a case study on New Zealand. https://doi.org/10.33217/keh/sharedprosperity/001/05.2019
2.Unemployment - Long-term unemployment rate - OECD Data. (n.d.-b). theOECD. https://data.oecd.org/unemp/long-term-unemployment-rate.htm
3.Chen, J. (2022, September 26). Underemployment: definition, causes, and example. Investopedia. https://www.investopedia.com/terms/u/underemployment.asp#:~:text=The%20U.S.%20unemployment%20rate%20was%2013.3%25%20as%20of,is%20the%20sum%20of%20the%20employed%20and%20unemployed%29.%22